Sample records for generation statistical modelling

  1. Different Manhattan project: automatic statistical model generation

    NASA Astrophysics Data System (ADS)

    Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

    2002-03-01

    We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.

  2. Visualization of the variability of 3D statistical shape models by animation.

    PubMed

    Lamecker, Hans; Seebass, Martin; Lange, Thomas; Hege, Hans-Christian; Deuflhard, Peter

    2004-01-01

    Models of the 3D shape of anatomical objects and the knowledge about their statistical variability are of great benefit in many computer assisted medical applications like images analysis, therapy or surgery planning. Statistical model of shapes have successfully been applied to automate the task of image segmentation. The generation of 3D statistical shape models requires the identification of corresponding points on two shapes. This remains a difficult problem, especially for shapes of complicated topology. In order to interpret and validate variations encoded in a statistical shape model, visual inspection is of great importance. This work describes the generation and interpretation of statistical shape models of the liver and the pelvic bone.

  3. Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos

    NASA Astrophysics Data System (ADS)

    Ganz, Melanie; Nielsen, Mads; Brandt, Sami

    We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.

  4. A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pasqualini, Donatella

    This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimatedmore » stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.« less

  5. Soft Mixer Assignment in a Hierarchical Generative Model of Natural Scene Statistics

    PubMed Central

    Schwartz, Odelia; Sejnowski, Terrence J.; Dayan, Peter

    2010-01-01

    Gaussian scale mixture models offer a top-down description of signal generation that captures key bottom-up statistical characteristics of filter responses to images. However, the pattern of dependence among the filters for this class of models is prespecified. We propose a novel extension to the gaussian scale mixture model that learns the pattern of dependence from observed inputs and thereby induces a hierarchical representation of these inputs. Specifically, we propose that inputs are generated by gaussian variables (modeling local filter structure), multiplied by a mixer variable that is assigned probabilistically to each input from a set of possible mixers. We demonstrate inference of both components of the generative model, for synthesized data and for different classes of natural images, such as a generic ensemble and faces. For natural images, the mixer variable assignments show invariances resembling those of complex cells in visual cortex; the statistics of the gaussian components of the model are in accord with the outputs of divisive normalization models. We also show how our model helps interrelate a wide range of models of image statistics and cortical processing. PMID:16999575

  6. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  7. A statistical approach for generating synthetic tip stress data from limited CPT soundings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Basalams, M.K.

    CPT tip stress data obtained from a Uranium mill tailings impoundment are treated as time series. A statistical class of models that was developed to model time series is explored to investigate its applicability in modeling the tip stress series. These models were developed by Box and Jenkins (1970) and are known as Autoregressive Moving Average (ARMA) models. This research demonstrates how to apply the ARMA models to tip stress series. Generation of synthetic tip stress series that preserve the main statistical characteristics of the measured series is also investigated. Multiple regression analysis is used to model the regional variationmore » of the ARMA model parameters as well as the regional variation of the mean and the standard deviation of the measured tip stress series. The reliability of the generated series is investigated from a geotechnical point of view as well as from a statistical point of view. Estimation of the total settlement using the measured and the generated series subjected to the same loading condition are performed. The variation of friction angle with depth of the impoundment materials is also investigated. This research shows that these series can be modeled by the Box and Jenkins ARMA models. A third degree Autoregressive model AR(3) is selected to represent these series. A theoretical double exponential density function is fitted to the AR(3) model residuals. Synthetic tip stress series are generated at nearby locations. The generated series are shown to be reliable in estimating the total settlement and the friction angle variation with depth for this particular site.« less

  8. A cloud and radiation model-based algorithm for rainfall retrieval from SSM/I multispectral microwave measurements

    NASA Technical Reports Server (NTRS)

    Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.

    1992-01-01

    A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.

  9. Use of a statistical model of the whole femur in a large scale, multi-model study of femoral neck fracture risk.

    PubMed

    Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark

    2009-09-18

    Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.

  10. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  11. TH-CD-202-07: A Methodology for Generating Numerical Phantoms for Radiation Therapy Using Geometric Attribute Distribution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dolly, S; Chen, H; Mutic, S

    Purpose: A persistent challenge for the quality assessment of radiation therapy treatments (e.g. contouring accuracy) is the absence of the known, ground truth for patient data. Moreover, assessment results are often patient-dependent. Computer simulation studies utilizing numerical phantoms can be performed for quality assessment with a known ground truth. However, previously reported numerical phantoms do not include the statistical properties of inter-patient variations, as their models are based on only one patient. In addition, these models do not incorporate tumor data. In this study, a methodology was developed for generating numerical phantoms which encapsulate the statistical variations of patients withinmore » radiation therapy, including tumors. Methods: Based on previous work in contouring assessment, geometric attribute distribution (GAD) models were employed to model both the deterministic and stochastic properties of individual organs via principle component analysis. Using pre-existing radiation therapy contour data, the GAD models are trained to model the shape and centroid distributions of each organ. Then, organs with different shapes and positions can be generated by assigning statistically sound weights to the GAD model parameters. Organ contour data from 20 retrospective prostate patient cases were manually extracted and utilized to train the GAD models. As a demonstration, computer-simulated CT images of generated numerical phantoms were calculated and assessed subjectively and objectively for realism. Results: A cohort of numerical phantoms of the male human pelvis was generated. CT images were deemed realistic both subjectively and objectively in terms of image noise power spectrum. Conclusion: A methodology has been developed to generate realistic numerical anthropomorphic phantoms using pre-existing radiation therapy data. The GAD models guarantee that generated organs span the statistical distribution of observed radiation therapy patients, according to the training dataset. The methodology enables radiation therapy treatment assessment with multi-modality imaging and a known ground truth, and without patient-dependent bias.« less

  12. A Comparison of Forecast Error Generators for Modeling Wind and Load Uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Ning; Diao, Ruisheng; Hafen, Ryan P.

    2013-07-25

    This paper presents four algorithms to generate random forecast error time series. The performance of four algorithms is compared. The error time series are used to create real-time (RT), hour-ahead (HA), and day-ahead (DA) wind and load forecast time series that statistically match historically observed forecasting data sets used in power grid operation to study the net load balancing need in variable generation integration studies. The four algorithms are truncated-normal distribution models, state-space based Markov models, seasonal autoregressive moving average (ARMA) models, and a stochastic-optimization based approach. The comparison is made using historical DA load forecast and actual load valuesmore » to generate new sets of DA forecasts with similar stoical forecast error characteristics (i.e., mean, standard deviation, autocorrelation, and cross-correlation). The results show that all methods generate satisfactory results. One method may preserve one or two required statistical characteristics better the other methods, but may not preserve other statistical characteristics as well compared with the other methods. Because the wind and load forecast error generators are used in wind integration studies to produce wind and load forecasts time series for stochastic planning processes, it is sometimes critical to use multiple methods to generate the error time series to obtain a statistically robust result. Therefore, this paper discusses and compares the capabilities of each algorithm to preserve the characteristics of the historical forecast data sets.« less

  13. A statistical shape model of the human second cervical vertebra.

    PubMed

    Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon

    2015-07-01

    Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.

  14. The power and robustness of maximum LOD score statistics.

    PubMed

    Yoo, Y J; Mendell, N R

    2008-07-01

    The maximum LOD score statistic is extremely powerful for gene mapping when calculated using the correct genetic parameter value. When the mode of genetic transmission is unknown, the maximum of the LOD scores obtained using several genetic parameter values is reported. This latter statistic requires higher critical value than the maximum LOD score statistic calculated from a single genetic parameter value. In this paper, we compare the power of maximum LOD scores based on three fixed sets of genetic parameter values with the power of the LOD score obtained after maximizing over the entire range of genetic parameter values. We simulate family data under nine generating models. For generating models with non-zero phenocopy rates, LOD scores maximized over the entire range of genetic parameters yielded greater power than maximum LOD scores for fixed sets of parameter values with zero phenocopy rates. No maximum LOD score was consistently more powerful than the others for generating models with a zero phenocopy rate. The power loss of the LOD score maximized over the entire range of genetic parameters, relative to the maximum LOD score calculated using the correct genetic parameter value, appeared to be robust to the generating models.

  15. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  16. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE PAGES

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

    2017-05-15

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  17. Modelling the effect of structural QSAR parameters on skin penetration using genetic programming

    NASA Astrophysics Data System (ADS)

    Chung, K. K.; Do, D. Q.

    2010-09-01

    In order to model relationships between chemical structures and biological effects in quantitative structure-activity relationship (QSAR) data, an alternative technique of artificial intelligence computing—genetic programming (GP)—was investigated and compared to the traditional method—statistical. GP, with the primary advantage of generating mathematical equations, was employed to model QSAR data and to define the most important molecular descriptions in QSAR data. The models predicted by GP agreed with the statistical results, and the most predictive models of GP were significantly improved when compared to the statistical models using ANOVA. Recently, artificial intelligence techniques have been applied widely to analyse QSAR data. With the capability of generating mathematical equations, GP can be considered as an effective and efficient method for modelling QSAR data.

  18. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

    NASA Astrophysics Data System (ADS)

    Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

    2017-05-01

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.

  19. Bootstrap data methodology for sequential hybrid model building

    NASA Technical Reports Server (NTRS)

    Volponi, Allan J. (Inventor); Brotherton, Thomas (Inventor)

    2007-01-01

    A method for modeling engine operation comprising the steps of: 1. collecting a first plurality of sensory data, 2. partitioning a flight envelope into a plurality of sub-regions, 3. assigning the first plurality of sensory data into the plurality of sub-regions, 4. generating an empirical model of at least one of the plurality of sub-regions, 5. generating a statistical summary model for at least one of the plurality of sub-regions, 6. collecting an additional plurality of sensory data, 7. partitioning the second plurality of sensory data into the plurality of sub-regions, 8. generating a plurality of pseudo-data using the empirical model, and 9. concatenating the plurality of pseudo-data and the additional plurality of sensory data to generate an updated empirical model and an updated statistical summary model for at least one of the plurality of sub-regions.

  20. Teaching Engineering Statistics with Technology, Group Learning, Contextual Projects, Simulation Models and Student Presentations

    ERIC Educational Resources Information Center

    Romeu, Jorge Luis

    2008-01-01

    This article discusses our teaching approach in graduate level Engineering Statistics. It is based on the use of modern technology, learning groups, contextual projects, simulation models, and statistical and simulation software to entice student motivation. The use of technology to facilitate group projects and presentations, and to generate,…

  1. A Three-Dimensional Statistical Average Skull: Application of Biometric Morphing in Generating Missing Anatomy.

    PubMed

    Teshima, Tara Lynn; Patel, Vaibhav; Mainprize, James G; Edwards, Glenn; Antonyshyn, Oleh M

    2015-07-01

    The utilization of three-dimensional modeling technology in craniomaxillofacial surgery has grown exponentially during the last decade. Future development, however, is hindered by the lack of a normative three-dimensional anatomic dataset and a statistical mean three-dimensional virtual model. The purpose of this study is to develop and validate a protocol to generate a statistical three-dimensional virtual model based on a normative dataset of adult skulls. Two hundred adult skull CT images were reviewed. The average three-dimensional skull was computed by processing each CT image in the series using thin-plate spline geometric morphometric protocol. Our statistical average three-dimensional skull was validated by reconstructing patient-specific topography in cranial defects. The experiment was repeated 4 times. In each case, computer-generated cranioplasties were compared directly to the original intact skull. The errors describing the difference between the prediction and the original were calculated. A normative database of 33 adult human skulls was collected. Using 21 anthropometric landmark points, a protocol for three-dimensional skull landmarking and data reduction was developed and a statistical average three-dimensional skull was generated. Our results show the root mean square error (RMSE) for restoration of a known defect using the native best match skull, our statistical average skull, and worst match skull was 0.58, 0.74, and 4.4  mm, respectively. The ability to statistically average craniofacial surface topography will be a valuable instrument for deriving missing anatomy in complex craniofacial defects and deficiencies as well as in evaluating morphologic results of surgery.

  2. Statistical Analysis of Complexity Generators for Cost Estimation

    NASA Technical Reports Server (NTRS)

    Rowell, Ginger Holmes

    1999-01-01

    Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.

  3. Assessment of the scale effect on statistical downscaling quality at a station scale using a weather generator-based model

    USDA-ARS?s Scientific Manuscript database

    The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...

  4. Deep Generative Models of Galaxy Images for the Calibration of the Next Generation of Weak Lensing Surveys

    NASA Astrophysics Data System (ADS)

    Lanusse, Francois; Ravanbakhsh, Siamak; Mandelbaum, Rachel; Schneider, Jeff; Poczos, Barnabas

    2017-01-01

    Weak gravitational lensing has long been identified as one of the most powerful probes to investigate the nature of dark energy. As such, weak lensing is at the heart of the next generation of cosmological surveys such as LSST, Euclid or WFIRST.One particularly crititcal source of systematic errors in these surveys comes from the shape measurement algorithms tasked with estimating galaxy shapes. GREAT3, the last community challenge to assess the quality of state-of-the-art shape measurement algorithms has in particular demonstrated that all current methods are biased to various degrees and, more importantly, that these biases depend on the details of the galaxy morphologies. These biases can be measured and calibrated by generating mock observations where a known lensing signal has been introduced and comparing the resulting measurements to the ground-truth. Producing these mock observations however requires input galaxy images of higher resolution and S/N than the simulated survey, which typically implies acquiring extremely expensive space-based observations.The goal of this work is to train a deep generative model on already available Hubble Space Telescope data which can then be used to sample new galaxy images conditioned on parameters such as magnitude, size or redshift and exhibiting complex morphologies. Such model can allow us to inexpensively produce large set of realistic realistic images for calibration purposes.We implement a conditional generative model based on state-of-the-art deep learning methods and fit it to deep galaxy images from the COSMOS survey. The quality of the model is assessed by computing an extensive set of galaxy morphology statistics on the generated images. Beyond simple second moment statistics such as size and ellipticity, we apply more complex statistics specifically designed to be sensitive to disturbed galaxy morphologies. We find excellent agreement between the morphologies of real and model generated galaxies.Our results suggest that such deep generative models represent a reliable alternative to the acquisition of expensive high quality observations for generating the calibration data needed by the next generation of weak lensing surveys.

  5. A comparative study of two statistical approaches for the analysis of real seismicity sequences and synthetic seismicity generated by a stick-slip experimental model

    NASA Astrophysics Data System (ADS)

    Flores-Marquez, Leticia Elsa; Ramirez Rojaz, Alejandro; Telesca, Luciano

    2015-04-01

    The study of two statistical approaches is analyzed for two different types of data sets, one is the seismicity generated by the subduction processes occurred at south Pacific coast of Mexico between 2005 and 2012, and the other corresponds to the synthetic seismic data generated by a stick-slip experimental model. The statistical methods used for the present study are the visibility graph in order to investigate the time dynamics of the series and the scaled probability density function in the natural time domain to investigate the critical order of the system. This comparison has the purpose to show the similarities between the dynamical behaviors of both types of data sets, from the point of view of critical systems. The observed behaviors allow us to conclude that the experimental set up globally reproduces the behavior observed in the statistical approaches used to analyses the seismicity of the subduction zone. The present study was supported by the Bilateral Project Italy-Mexico Experimental Stick-slip models of tectonic faults: innovative statistical approaches applied to synthetic seismic sequences, jointly funded by MAECI (Italy) and AMEXCID (Mexico) in the framework of the Bilateral Agreement for Scientific and Technological Cooperation PE 2014-2016.

  6. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Generating synthetic daily precipitation realizations for seasonal precipitation forecasts

    USDA-ARS?s Scientific Manuscript database

    Synthetic weather generation models that depend on statistics of past weather observations are often limited in their applications to issues that depend upon historical weather characteristics. Enhancing these models to take advantage of increasingly available and skillful seasonal climate outlook p...

  8. The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images.

    PubMed

    Loxley, P N

    2017-10-01

    The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.

  9. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    PubMed

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. A neural network model of metaphor understanding with dynamic interaction based on a statistical language analysis: targeting a human-like model.

    PubMed

    Terai, Asuka; Nakagawa, Masanori

    2007-08-01

    The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.

  12. Are Statisticians Cold-Blooded Bosses? A New Perspective on the "Old" Concept of Statistical Population

    ERIC Educational Resources Information Center

    Lu, Yonggang; Henning, Kevin S. S.

    2013-01-01

    Spurred by recent writings regarding statistical pragmatism, we propose a simple, practical approach to introducing students to a new style of statistical thinking that models nature through the lens of data-generating processes, not populations. (Contains 5 figures.)

  13. Statistical error model for a solar electric propulsion thrust subsystem

    NASA Technical Reports Server (NTRS)

    Bantell, M. H.

    1973-01-01

    The solar electric propulsion thrust subsystem statistical error model was developed as a tool for investigating the effects of thrust subsystem parameter uncertainties on navigation accuracy. The model is currently being used to evaluate the impact of electric engine parameter uncertainties on navigation system performance for a baseline mission to Encke's Comet in the 1980s. The data given represent the next generation in statistical error modeling for low-thrust applications. Principal improvements include the representation of thrust uncertainties and random process modeling in terms of random parametric variations in the thrust vector process for a multi-engine configuration.

  14. Predicting network modules of cell cycle regulators using relative protein abundance statistics.

    PubMed

    Oguz, Cihan; Watson, Layne T; Baumann, William T; Tyson, John J

    2017-02-28

    Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.

  15. Generating survival times to simulate Cox proportional hazards models with time-varying covariates.

    PubMed

    Austin, Peter C

    2012-12-20

    Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can change at most once from untreated to treated (e.g., organ transplant); second, a continuous time-varying covariate such as cumulative exposure at a constant dose to radiation or to a pharmaceutical agent used for a chronic condition; third, a dichotomous time-varying covariate with a subject being able to move repeatedly between treatment states (e.g., current compliance or use of a medication). In each setting, we derive closed-form expressions that allow one to simulate survival times so that survival times are related to a vector of fixed or time-invariant covariates and to a single time-varying covariate. We illustrate the utility of our closed-form expressions for simulating event times by using Monte Carlo simulations to estimate the statistical power to detect as statistically significant the effect of different types of binary time-varying covariates. This is compared with the statistical power to detect as statistically significant a binary time-invariant covariate. Copyright © 2012 John Wiley & Sons, Ltd.

  16. Generating action descriptions from statistically integrated representations of human motions and sentences.

    PubMed

    Takano, Wataru; Kusajima, Ikuo; Nakamura, Yoshihiko

    2016-08-01

    It is desirable for robots to be able to linguistically understand human actions during human-robot interactions. Previous research has developed frameworks for encoding human full body motion into model parameters and for classifying motion into specific categories. For full understanding, the motion categories need to be connected to the natural language such that the robots can interpret human motions as linguistic expressions. This paper proposes a novel framework for integrating observation of human motion with that of natural language. This framework consists of two models; the first model statistically learns the relations between motions and their relevant words, and the second statistically learns sentence structures as word n-grams. Integration of these two models allows robots to generate sentences from human motions by searching for words relevant to the motion using the first model and then arranging these words in appropriate order using the second model. This allows making sentences that are the most likely to be generated from the motion. The proposed framework was tested on human full body motion measured by an optical motion capture system. In this, descriptive sentences were manually attached to the motions, and the validity of the system was demonstrated. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. GPCC - A weather generator-based statistical downscaling tool for site-specific assessment of climate change impacts

    USDA-ARS?s Scientific Manuscript database

    Resolution of climate model outputs are too coarse to be used as direct inputs to impact models for assessing climate change impacts on agricultural production, water resources, and eco-system services at local or site-specific scales. Statistical downscaling approaches are usually used to bridge th...

  18. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  19. Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

    NASA Astrophysics Data System (ADS)

    Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

    2016-03-01

    How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.

  20. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  1. Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

    DOE PAGES

    Calvin, Kate; Fisher-Vanden, Karen

    2017-10-27

    Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparisonmore » between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between - 12% and + 15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.« less

  2. Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

    NASA Astrophysics Data System (ADS)

    Calvin, Kate; Fisher-Vanden, Karen

    2017-11-01

    Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparison between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between -12% and +15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.

  3. Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Calvin, Kate; Fisher-Vanden, Karen

    Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparisonmore » between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between - 12% and + 15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.« less

  4. A Statistical Method to Distinguish Functional Brain Networks

    PubMed Central

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  5. A Statistical Method to Distinguish Functional Brain Networks.

    PubMed

    Fujita, André; Vidal, Maciel C; Takahashi, Daniel Y

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism ( p < 0.001).

  6. Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications

    NASA Astrophysics Data System (ADS)

    Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.

    2015-12-01

    In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.

  7. Context-Aware Generative Adversarial Privacy

    NASA Astrophysics Data System (ADS)

    Huang, Chong; Kairouz, Peter; Chen, Xiao; Sankar, Lalitha; Rajagopal, Ram

    2017-12-01

    Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.

  8. The impacts of renewable energy policies on renewable energy sources for electricity generating capacity

    NASA Astrophysics Data System (ADS)

    Koo, Bryan Bonsuk

    Electricity generation from non-hydro renewable sources has increased rapidly in the last decade. For example, Renewable Energy Sources for Electricity (RES-E) generating capacity in the U.S. almost doubled for the last three year from 2009 to 2012. Multiple papers point out that RES-E policies implemented by state governments play a crucial role in increasing RES-E generation or capacity. This study examines the effects of state RES-E policies on state RES-E generating capacity, using a fixed effects model. The research employs panel data from the 50 states and the District of Columbia, for the period 1990 to 2011, and uses a two-stage approach to control endogeneity embedded in the policies adopted by state governments, and a Prais-Winsten estimator to fix any autocorrelation in the panel data. The analysis finds that Renewable Portfolio Standards (RPS) and Net-metering are significantly and positively associated with RES-E generating capacity, but neither Public Benefit Funds nor the Mandatory Green Power Option has a statistically significant relation to RES-E generating capacity. Results of the two-stage model are quite different from models which do not employ predicted policy variables. Analysis using non-predicted variables finds that RPS and Net-metering policy are statistically insignificant and negatively associated with RES-E generating capacity. On the other hand, Green Energy Purchasing policy is insignificant in the two-stage model, but significant in the model without predicted values.

  9. Multiple-Point statistics for stochastic modeling of aquifers, where do we stand?

    NASA Astrophysics Data System (ADS)

    Renard, P.; Julien, S.

    2017-12-01

    In the last 20 years, multiple-point statistics have been a focus of much research, successes and disappointments. The aim of this geostatistical approach was to integrate geological information into stochastic models of aquifer heterogeneity to better represent the connectivity of high or low permeability structures in the underground. Many different algorithms (ENESIM, SNESIM, SIMPAT, CCSIM, QUILTING, IMPALA, DEESSE, FILTERSIM, HYPPS, etc.) have been and are still proposed. They are all based on the concept of a training data set from which spatial statistics are derived and used in a further step to generate conditional realizations. Some of these algorithms evaluate the statistics of the spatial patterns for every pixel, other techniques consider the statistics at the scale of a patch or a tile. While the method clearly succeeded in enabling modelers to generate realistic models, several issues are still the topic of debate both from a practical and theoretical point of view, and some issues such as training data set availability are often hindering the application of the method in practical situations. In this talk, the aim is to present a review of the status of these approaches both from a theoretical and practical point of view using several examples at different scales (from pore network to regional aquifer).

  10. Stochastic Hourly Weather Generator HOWGH: Validation and its Use in Pest Modelling under Present and Future Climates

    NASA Astrophysics Data System (ADS)

    Dubrovsky, M.; Hirschi, M.; Spirig, C.

    2014-12-01

    To quantify impact of the climate change on a specific pest (or any weather-dependent process) in a specific site, we may use a site-calibrated pest (or other) model and compare its outputs obtained with site-specific weather data representing present vs. perturbed climates. The input weather data may be produced by the stochastic weather generator. Apart from the quality of the pest model, the reliability of the results obtained in such experiment depend on an ability of the generator to represent the statistical structure of the real world weather series, and on the sensitivity of the pest model to possible imperfections of the generator. This contribution deals with the multivariate HOWGH weather generator, which is based on a combination of parametric and non-parametric statistical methods. Here, HOWGH is used to generate synthetic hourly series of three weather variables (solar radiation, temperature and precipitation) required by a dynamic pest model SOPRA to simulate the development of codling moth. The contribution presents results of the direct and indirect validation of HOWGH. In the direct validation, the synthetic series generated by HOWGH (various settings of its underlying model are assumed) are validated in terms of multiple climatic characteristics, focusing on the subdaily wet/dry and hot/cold spells. In the indirect validation, we assess the generator in terms of characteristics derived from the outputs of SOPRA model fed by the observed vs. synthetic series. The weather generator may be used to produce weather series representing present and future climates. In the latter case, the parameters of the generator may be modified by the climate change scenarios based on Global or Regional Climate Models. To demonstrate this feature, the results of codling moth simulations for future climate will be shown. Acknowledgements: The weather generator is developed and validated within the frame of projects WG4VALUE (project LD12029 sponsored by the Ministry of Education, Youth and Sports of CR), and VALUE (COST ES 1102 action).

  11. Assessing the Impact of Climate Change on Stream Temperatures in the Methow River Basin, Washington

    NASA Astrophysics Data System (ADS)

    Gangopadhyay, S.; Caldwell, R. J.; Lai, Y.; Bountry, J.

    2011-12-01

    The Methow River in Washington offers prime spawning habitat for salmon and other cold-water fishes. During the summer months, low streamflows on the Methow result in cutoff side channels that limit the habitat available to these fishes. Future climate scenarios of increasing air temperature and decreasing precipitation suggest the potential for increasing loss of habitat and fish mortality as stream temperatures rise in response to lower flows and additional heating. To assess the impacts of climate change on stream temperature in the Methow River, the US Bureau of Reclamation is developing an hourly time-step, two-dimensional hydraulic model of the confluence of the Methow and Chewuch Rivers above Winthrop. The model will be coupled with a physical stream temperature model to generate spatial representations of stream conditions conducive for fish habitat. In this study, we develop a statistical framework for generating stream temperature time series from global climate model (GCM) and hydrologic model outputs. Regional observations of stream temperature and hydrometeorological conditions are used to develop statistical models of daily mean stream temperature for the Methow River at Winthrop, WA. Temperature and precipitation projections from 10 global climate models (GCMs) are coupled with the streamflow generated using the University of Washington Variable Infiltration Capacity model. The projections serve as input to the statistical models to generate daily time series of mean daily stream temperature. Since the output from the GCM, VIC, and statistical models offer only daily data, a k-nearest neighbor (k-nn) resampling technique is employed to select appropriate proportion vectors for disaggregating the Winthrop daily flow and temperature to an upstream location on each of the rivers above the confluence. Hourly proportion vectors are then used to disaggregate the daily flow and temperature to hourly values to be used in the hydraulic model. Historical meteorological variables are also selected using the k-nn method. We present the statistical modeling framework using Generalized Linear Models (GLMs), along with diagnostics and measurements of skill. We will also provide a comparison of the stream temperature projections from the future years of 2020, 2040, and 2080 and discuss the potential implications on fish habitat in the Methow River. Future integration of the hourly climate scenarios in the hydraulic model will provide the ability to assess the spatial extent of habitat impacts and allow the USBR to evaluate the effectiveness of various river restoration projects in maintaining or improving habitat in a changing climate.

  12. Statistical properties of superimposed stationary spike trains.

    PubMed

    Deger, Moritz; Helias, Moritz; Boucsein, Clemens; Rotter, Stefan

    2012-06-01

    The Poisson process is an often employed model for the activity of neuronal populations. It is known, though, that superpositions of realistic, non- Poisson spike trains are not in general Poisson processes, not even for large numbers of superimposed processes. Here we construct superimposed spike trains from intracellular in vivo recordings from rat neocortex neurons and compare their statistics to specific point process models. The constructed superimposed spike trains reveal strong deviations from the Poisson model. We find that superpositions of model spike trains that take the effective refractoriness of the neurons into account yield a much better description. A minimal model of this kind is the Poisson process with dead-time (PPD). For this process, and for superpositions thereof, we obtain analytical expressions for some second-order statistical quantities-like the count variability, inter-spike interval (ISI) variability and ISI correlations-and demonstrate the match with the in vivo data. We conclude that effective refractoriness is the key property that shapes the statistical properties of the superposition spike trains. We present new, efficient algorithms to generate superpositions of PPDs and of gamma processes that can be used to provide more realistic background input in simulations of networks of spiking neurons. Using these generators, we show in simulations that neurons which receive superimposed spike trains as input are highly sensitive for the statistical effects induced by neuronal refractoriness.

  13. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  14. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  15. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  16. Binomial test statistics using Psi functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowman, Kimiko o

    2007-01-01

    For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.

  17. Appropriate Domain Size for Groundwater Flow Modeling with a Discrete Fracture Network Model.

    PubMed

    Ji, Sung-Hoon; Koh, Yong-Kwon

    2017-01-01

    When a discrete fracture network (DFN) is constructed from statistical conceptualization, uncertainty in simulating the hydraulic characteristics of a fracture network can arise due to the domain size. In this study, the appropriate domain size, where less significant uncertainty in the stochastic DFN model is expected, was suggested for the Korea Atomic Energy Research Institute Underground Research Tunnel (KURT) site. The stochastic DFN model for the site was established, and the appropriate domain size was determined with the density of the percolating cluster and the percolation probability using the stochastically generated DFNs for various domain sizes. The applicability of the appropriate domain size to our study site was evaluated by comparing the statistical properties of stochastically generated fractures of varying domain sizes and estimating the uncertainty in the equivalent permeability of the generated DFNs. Our results show that the uncertainty of the stochastic DFN model is acceptable when the modeling domain is larger than the determined appropriate domain size, and the appropriate domain size concept is applicable to our study site. © 2016, National Ground Water Association.

  18. Parametric vs. non-parametric daily weather generator: validation and comparison

    NASA Astrophysics Data System (ADS)

    Dubrovsky, Martin

    2016-04-01

    As the climate models (GCMs and RCMs) fail to satisfactorily reproduce the real-world surface weather regime, various statistical methods are applied to downscale GCM/RCM outputs into site-specific weather series. The stochastic weather generators are among the most favourite downscaling methods capable to produce realistic (observed like) meteorological inputs for agrological, hydrological and other impact models used in assessing sensitivity of various ecosystems to climate change/variability. To name their advantages, the generators may (i) produce arbitrarily long multi-variate synthetic weather series representing both present and changed climates (in the latter case, the generators are commonly modified by GCM/RCM-based climate change scenarios), (ii) be run in various time steps and for multiple weather variables (the generators reproduce the correlations among variables), (iii) be interpolated (and run also for sites where no weather data are available to calibrate the generator). This contribution will compare two stochastic daily weather generators in terms of their ability to reproduce various features of the daily weather series. M&Rfi is a parametric generator: Markov chain model is used to model precipitation occurrence, precipitation amount is modelled by the Gamma distribution, and the 1st order autoregressive model is used to generate non-precipitation surface weather variables. The non-parametric GoMeZ generator is based on the nearest neighbours resampling technique making no assumption on the distribution of the variables being generated. Various settings of both weather generators will be assumed in the present validation tests. The generators will be validated in terms of (a) extreme temperature and precipitation characteristics (annual and 30 years extremes and maxima of duration of hot/cold/dry/wet spells); (b) selected validation statistics developed within the frame of VALUE project. The tests will be based on observational weather series from several European stations available from the ECA&D database.

  19. A canonical neural mechanism for behavioral variability

    NASA Astrophysics Data System (ADS)

    Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David

    2017-05-01

    The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5-6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these `universal' statistics.

  20. Validation of non-stationary precipitation series for site-specific impact assessment: comparison of two statistical downscaling techniques

    NASA Astrophysics Data System (ADS)

    Mullan, Donal; Chen, Jie; Zhang, Xunchang John

    2016-02-01

    Statistical downscaling (SD) methods have become a popular, low-cost and accessible means of bridging the gap between the coarse spatial resolution at which climate models output climate scenarios and the finer spatial scale at which impact modellers require these scenarios, with various different SD techniques used for a wide range of applications across the world. This paper compares the Generator for Point Climate Change (GPCC) model and the Statistical DownScaling Model (SDSM)—two contrasting SD methods—in terms of their ability to generate precipitation series under non-stationary conditions across ten contrasting global climates. The mean, maximum and a selection of distribution statistics as well as the cumulative frequencies of dry and wet spells for four different temporal resolutions were compared between the models and the observed series for a validation period. Results indicate that both methods can generate daily precipitation series that generally closely mirror observed series for a wide range of non-stationary climates. However, GPCC tends to overestimate higher precipitation amounts, whilst SDSM tends to underestimate these. This infers that GPCC is more likely to overestimate the effects of precipitation on a given impact sector, whilst SDSM is likely to underestimate the effects. GPCC performs better than SDSM in reproducing wet and dry day frequency, which is a key advantage for many impact sectors. Overall, the mixed performance of the two methods illustrates the importance of users performing a thorough validation in order to determine the influence of simulated precipitation on their chosen impact sector.

  1. Principal component analysis and neurocomputing-based models for total ozone concentration over different urban regions of India

    NASA Astrophysics Data System (ADS)

    Chattopadhyay, Goutami; Chattopadhyay, Surajit; Chakraborthy, Parthasarathi

    2012-07-01

    The present study deals with daily total ozone concentration time series over four metro cities of India namely Kolkata, Mumbai, Chennai, and New Delhi in the multivariate environment. Using the Kaiser-Meyer-Olkin measure, it is established that the data set under consideration are suitable for principal component analysis. Subsequently, by introducing rotated component matrix for the principal components, the predictors suitable for generating artificial neural network (ANN) for daily total ozone prediction are identified. The multicollinearity is removed in this way. Models of ANN in the form of multilayer perceptron trained through backpropagation learning are generated for all of the study zones, and the model outcomes are assessed statistically. Measuring various statistics like Pearson correlation coefficients, Willmott's indices, percentage errors of prediction, and mean absolute errors, it is observed that for Mumbai and Kolkata the proposed ANN model generates very good predictions. The results are supported by the linearly distributed coordinates in the scatterplots.

  2. Dependent Personality Disorder: Comparing an Expert Generated and Empirically Derived Five-Factor Model Personality Disorder Count

    ERIC Educational Resources Information Center

    Miller, Joshua D.; Lynam, Donald R.

    2008-01-01

    Assessment of the "Diagnostic and Statistical Manual of Mental Disorders" (4th Ed.; "DSM-IV") personality disorders (PDs) using five-factor model (FFM) prototypes and counts has shown substantial promise, with a few exceptions. Miller, Reynolds, and Pilkonis suggested that the expert-generated FFM dependent prototype might be misspecified in…

  3. Generation of future potential scenarios in an Alpine Catchment by applying bias-correction techniques, delta-change approaches and stochastic Weather Generators at different spatial scale. Analysis of their influence on basic and drought statistics.

    NASA Astrophysics Data System (ADS)

    Collados-Lara, Antonio-Juan; Pulido-Velazquez, David; Pardo-Iguzquiza, Eulogio

    2017-04-01

    Assessing impacts of potential future climate change scenarios in precipitation and temperature is essential to design adaptive strategies in water resources systems. The objective of this work is to analyze the possibilities of different statistical downscaling methods to generate future potential scenarios in an Alpine Catchment from historical data and the available climate models simulations performed in the frame of the CORDEX EU project. The initial information employed to define these downscaling approaches are the historical climatic data (taken from the Spain02 project for the period 1971-2000 with a spatial resolution of 12.5 Km) and the future series provided by climatic models in the horizon period 2071-2100 . We have used information coming from nine climate model simulations (obtained from five different Regional climate models (RCM) nested to four different Global Climate Models (GCM)) from the European CORDEX project. In our application we have focused on the Representative Concentration Pathways (RCP) 8.5 emissions scenario, which is the most unfavorable scenario considered in the fifth Assessment Report (AR5) by the Intergovernmental Panel on Climate Change (IPCC). For each RCM we have generated future climate series for the period 2071-2100 by applying two different approaches, bias correction and delta change, and five different transformation techniques (first moment correction, first and second moment correction, regression functions, quantile mapping using distribution derived transformation and quantile mapping using empirical quantiles) for both of them. Ensembles of the obtained series were proposed to obtain more representative potential future climate scenarios to be employed to study potential impacts. In this work we propose a non-equifeaseble combination of the future series giving more weight to those coming from models (delta change approaches) or combination of models and techniques that provides better approximation to the basic and drought statistic of the historical data. A multi-objective analysis using basic statistics (mean, standard deviation and asymmetry coefficient) and droughts statistics (duration, magnitude and intensity) has been performed to identify which models are better in terms of goodness of fit to reproduce the historical series. The drought statistics have been obtained from the Standard Precipitation index (SPI) series using the Theory of Runs. This analysis allows discriminate the best RCM and the best combination of model and correction technique in the bias-correction method. We have also analyzed the possibilities of using different Stochastic Weather Generators to approximate the basic and droughts statistics of the historical series. These analyses have been performed in our case study in a lumped and in a distributed way in order to assess its sensibility to the spatial scale. The statistic of the future temperature series obtained with different ensemble options are quite homogeneous, but the precipitation shows a higher sensibility to the adopted method and spatial scale. The global increment in the mean temperature values are 31.79 %, 31.79 %, 31.03 % and 31.74 % for the distributed bias-correction, distributed delta-change, lumped bias-correction and lumped delta-change ensembles respectively and in the precipitation they are -25.48 %, -28.49 %, -26.42 % and -27.35% respectively. Acknowledgments: This research work has been partially supported by the GESINHIMPADAPT project (CGL2013-48424-C2-2-R) with Spanish MINECO funds. We would also like to thank Spain02 and CORDEX projects for the data provided for this study and the R package qmap.

  4. Ensemble engineering and statistical modeling for parameter calibration towards optimal design of microbial fuel cells

    NASA Astrophysics Data System (ADS)

    Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen

    2017-07-01

    Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.

  5. Genetic Programming as Alternative for Predicting Development Effort of Individual Software Projects

    PubMed Central

    Chavoya, Arturo; Lopez-Martin, Cuauhtemoc; Andalon-Garcia, Irma R.; Meda-Campaña, M. E.

    2012-01-01

    Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment. PMID:23226305

  6. Extracting and Converting Quantitative Data into Human Error Probabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe

    2007-08-01

    This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less

  7. Software for Data Analysis with Graphical Models

    NASA Technical Reports Server (NTRS)

    Buntine, Wray L.; Roy, H. Scott

    1994-01-01

    Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.

  8. Simulation and analysis of scalable non-Gaussian statistically anisotropic random functions

    NASA Astrophysics Data System (ADS)

    Riva, Monica; Panzeri, Marco; Guadagnini, Alberto; Neuman, Shlomo P.

    2015-12-01

    Many earth and environmental (as well as other) variables, Y, and their spatial or temporal increments, ΔY, exhibit non-Gaussian statistical scaling. Previously we were able to capture some key aspects of such scaling by treating Y or ΔY as standard sub-Gaussian random functions. We were however unable to reconcile two seemingly contradictory observations, namely that whereas sample frequency distributions of Y (or its logarithm) exhibit relatively mild non-Gaussian peaks and tails, those of ΔY display peaks that grow sharper and tails that become heavier with decreasing separation distance or lag. Recently we overcame this difficulty by developing a new generalized sub-Gaussian model which captures both behaviors in a unified and consistent manner, exploring it on synthetically generated random functions in one dimension (Riva et al., 2015). Here we extend our generalized sub-Gaussian model to multiple dimensions, present an algorithm to generate corresponding random realizations of statistically isotropic or anisotropic sub-Gaussian functions and illustrate it in two dimensions. We demonstrate the accuracy of our algorithm by comparing ensemble statistics of Y and ΔY (such as, mean, variance, variogram and probability density function) with those of Monte Carlo generated realizations. We end by exploring the feasibility of estimating all relevant parameters of our model by analyzing jointly spatial moments of Y and ΔY obtained from a single realization of Y.

  9. Parasol cell mosaics are unlikely to drive the formation of structured orientation maps in primary visual cortex.

    PubMed

    Hore, Victoria R A; Troy, John B; Eglen, Stephen J

    2012-11-01

    The receptive fields of on- and off-center parasol cell mosaics independently tile the retina to ensure efficient sampling of visual space. A recent theoretical model represented the on- and off-center mosaics by noisy hexagonal lattices of slightly different density. When the two lattices are overlaid, long-range Moiré interference patterns are generated. These Moiré interference patterns have been suggested to drive the formation of highly structured orientation maps in visual cortex. Here, we show that noisy hexagonal lattices do not capture the spatial statistics of parasol cell mosaics. An alternative model based upon local exclusion zones, termed as the pairwise interaction point process (PIPP) model, generates patterns that are statistically indistinguishable from parasol cell mosaics. A key difference between the PIPP model and the hexagonal lattice model is that the PIPP model does not generate Moiré interference patterns, and hence stimulated orientation maps do not show any hexagonal structure. Finally, we estimate the spatial extent of spatial correlations in parasol cell mosaics to be only 200-350 μm, far less than that required to generate Moiré interference. We conclude that parasol cell mosaics are too disordered to drive the formation of highly structured orientation maps in visual cortex.

  10. Next generation of weather generators on web service framework

    NASA Astrophysics Data System (ADS)

    Chinnachodteeranun, R.; Hung, N. D.; Honda, K.; Ines, A. V. M.

    2016-12-01

    Weather generator is a statistical model that synthesizes possible realization of long-term historical weather in future. It generates several tens to hundreds of realizations stochastically based on statistical analysis. Realization is essential information as a crop modeling's input for simulating crop growth and yield. Moreover, they can be contributed to analyzing uncertainty of weather to crop development stage and to decision support system on e.g. water management and fertilizer management. Performing crop modeling requires multidisciplinary skills which limit the usage of weather generator only in a research group who developed it as well as a barrier for newcomers. To improve the procedures of performing weather generators as well as the methodology to acquire the realization in a standard way, we implemented a framework for providing weather generators as web services, which support service interoperability. Legacy weather generator programs were wrapped in the web service framework. The service interfaces were implemented based on an international standard that was Sensor Observation Service (SOS) defined by Open Geospatial Consortium (OGC). Clients can request realizations generated by the model through SOS Web service. Hierarchical data preparation processes required for weather generator are also implemented as web services and seamlessly wired. Analysts and applications can invoke services over a network easily. The services facilitate the development of agricultural applications and also reduce the workload of analysts on iterative data preparation and handle legacy weather generator program. This architectural design and implementation can be a prototype for constructing further services on top of interoperable sensor network system. This framework opens an opportunity for other sectors such as application developers and scientists in other fields to utilize weather generators.

  11. Electromagnetic sinc Schell-model beams and their statistical properties.

    PubMed

    Mei, Zhangrong; Mao, Yonghua

    2014-09-22

    A class of electromagnetic sources with sinc Schell-model correlations is introduced. The conditions on source parameters guaranteeing that the source generates a physical beam are derived. The evolution behaviors of statistical properties for the electromagnetic stochastic beams generated by this new source on propagating in free space and in atmosphere turbulence are investigated with the help of the weighted superposition method and by numerical simulations. It is demonstrated that the intensity distributions of such beams exhibit unique features on propagating in free space and produce a double-layer flat-top profile of being shape-invariant in the far field. This feature makes this new beam particularly suitable for some special laser processing applications. The influences of the atmosphere turbulence with a non-Kolmogorov power spectrum on statistical properties of the new beams are analyzed in detail.

  12. A cortical integrate-and-fire neural network model for blind decoding of visual prosthetic stimulation.

    PubMed

    Eiber, Calvin D; Morley, John W; Lovell, Nigel H; Suaning, Gregg J

    2014-01-01

    We present a computational model of the optic pathway which has been adapted to simulate cortical responses to visual-prosthetic stimulation. This model reproduces the statistically observed distributions of spikes for cortical recordings of sham and maximum-intensity stimuli, while simultaneously generating cellular receptive fields consistent with those observed using traditional visual neuroscience methods. By inverting this model to generate candidate phosphenes which could generate the responses observed to novel stimulation strategies, we hope to aid the development of said strategies in-vivo before being deployed in clinical settings.

  13. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  14. Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC

    PubMed Central

    Templeton, Alan R.

    2009-01-01

    Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182

  15. Statistical characteristic in time-domain of direct current corona-generated audible noise from conductor in corona cage

    NASA Astrophysics Data System (ADS)

    Li, Xuebao; Cui, Xiang; Lu, Tiebing; Ma, Wenzuo; Bian, Xingming; Wang, Donglai; Hiziroglu, Huseyin

    2016-03-01

    The corona-generated audible noise (AN) has become one of decisive factors in the design of high voltage direct current (HVDC) transmission lines. The AN from transmission lines can be attributed to sound pressure pulses which are generated by the multiple corona sources formed on the conductor, i.e., transmission lines. In this paper, a detailed time-domain characteristics of the sound pressure pulses, which are generated by the DC corona discharges formed over the surfaces of a stranded conductors, are investigated systematically in a laboratory settings using a corona cage structure. The amplitude of sound pressure pulse and its time intervals are extracted by observing a direct correlation between corona current pulses and corona-generated sound pressure pulses. Based on the statistical characteristics, a stochastic model is presented for simulating the sound pressure pulses due to DC corona discharges occurring on conductors. The proposed stochastic model is validated by comparing the calculated and measured A-weighted sound pressure level (SPL). The proposed model is then used to analyze the influence of the pulse amplitudes and pulse rate on the SPL. Furthermore, a mathematical relationship is found between the SPL and conductor diameter, electric field, and radial distance.

  16. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  17. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. A canonical neural mechanism for behavioral variability

    PubMed Central

    Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David

    2017-01-01

    The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5–6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these ‘universal' statistics. PMID:28530225

  19. New Methodology for Estimating Fuel Economy by Vehicle Class

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chin, Shih-Miao; Dabbs, Kathryn; Hwang, Ho-Ling

    2011-01-01

    Office of Highway Policy Information to develop a new methodology to generate annual estimates of average fuel efficiency and number of motor vehicles registered by vehicle class for Table VM-1 of the Highway Statistics annual publication. This paper describes the new methodology developed under this effort and compares the results of the existing manual method and the new systematic approach. The methodology developed under this study takes a two-step approach. First, the preliminary fuel efficiency rates are estimated based on vehicle stock models for different classes of vehicles. Then, a reconciliation model is used to adjust the initial fuel consumptionmore » rates from the vehicle stock models and match the VMT information for each vehicle class and the reported total fuel consumption. This reconciliation model utilizes a systematic approach that produces documentable and reproducible results. The basic framework utilizes a mathematical programming formulation to minimize the deviations between the fuel economy estimates published in the previous year s Highway Statistics and the results from the vehicle stock models, subject to the constraint that fuel consumptions for different vehicle classes must sum to the total fuel consumption estimate published in Table MF-21 of the current year Highway Statistics. The results generated from this new approach provide a smoother time series for the fuel economies by vehicle class. It also utilizes the most up-to-date and best available data with sound econometric models to generate MPG estimates by vehicle class.« less

  20. A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

    NASA Astrophysics Data System (ADS)

    Zack, J. W.

    2015-12-01

    Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.

  1. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  2. Statistical modeling of 4D respiratory lung motion using diffeomorphic image registration.

    PubMed

    Ehrhardt, Jan; Werner, René; Schmidt-Richberg, Alexander; Handels, Heinz

    2011-02-01

    Modeling of respiratory motion has become increasingly important in various applications of medical imaging (e.g., radiation therapy of lung cancer). Current modeling approaches are usually confined to intra-patient registration of 3D image data representing the individual patient's anatomy at different breathing phases. We propose an approach to generate a mean motion model of the lung based on thoracic 4D computed tomography (CT) data of different patients to extend the motion modeling capabilities. Our modeling process consists of three steps: an intra-subject registration to generate subject-specific motion models, the generation of an average shape and intensity atlas of the lung as anatomical reference frame, and the registration of the subject-specific motion models to the atlas in order to build a statistical 4D mean motion model (4D-MMM). Furthermore, we present methods to adapt the 4D mean motion model to a patient-specific lung geometry. In all steps, a symmetric diffeomorphic nonlinear intensity-based registration method was employed. The Log-Euclidean framework was used to compute statistics on the diffeomorphic transformations. The presented methods are then used to build a mean motion model of respiratory lung motion using thoracic 4D CT data sets of 17 patients. We evaluate the model by applying it for estimating respiratory motion of ten lung cancer patients. The prediction is evaluated with respect to landmark and tumor motion, and the quantitative analysis results in a mean target registration error (TRE) of 3.3 ±1.6 mm if lung dynamics are not impaired by large lung tumors or other lung disorders (e.g., emphysema). With regard to lung tumor motion, we show that prediction accuracy is independent of tumor size and tumor motion amplitude in the considered data set. However, tumors adhering to non-lung structures degrade local lung dynamics significantly and the model-based prediction accuracy is lower in these cases. The statistical respiratory motion model is capable of providing valuable prior knowledge in many fields of applications. We present two examples of possible applications in radiation therapy and image guided diagnosis.

  3. Endogenous time-varying risk aversion and asset returns.

    PubMed

    Berardi, Michele

    2016-01-01

    Stylized facts about statistical properties for short horizon returns in financial markets have been identified in the literature, but a satisfactory understanding for their manifestation is yet to be achieved. In this work, we show that a simple asset pricing model with representative agent is able to generate time series of returns that replicate such stylized facts if the risk aversion coefficient is allowed to change endogenously over time in response to unexpected excess returns under evolutionary forces. The same model, under constant risk aversion, would instead generate returns that are essentially Gaussian. We conclude that an endogenous time-varying risk aversion represents a very parsimonious way to make the model match real data on key statistical properties, and therefore deserves careful consideration from economists and practitioners alike.

  4. Application of SDSM and LARS-WG for simulating and downscaling of rainfall and temperature

    NASA Astrophysics Data System (ADS)

    Hassan, Zulkarnain; Shamsudin, Supiah; Harun, Sobri

    2014-04-01

    Climate change is believed to have significant impacts on the water basin and region, such as in a runoff and hydrological system. However, impact studies on the water basin and region are difficult, since general circulation models (GCMs), which are widely used to simulate future climate scenarios, do not provide reliable hours of daily series rainfall and temperature for hydrological modeling. There is a technique named as "downscaling techniques", which can derive reliable hour of daily series rainfall and temperature due to climate scenarios from the GCMs output. In this study, statistical downscaling models are used to generate the possible future values of local meteorological variables such as rainfall and temperature in the selected stations in Peninsular of Malaysia. The models are: (1) statistical downscaling model (SDSM) that utilized the regression models and stochastic weather generators and (2) Long Ashton research station weather generator (LARS-WG) that only utilized the stochastic weather generators. The LARS-WG and SDSM models obviously are feasible methods to be used as tools in quantifying effects of climate change condition in a local scale. SDSM yields a better performance compared to LARS-WG, except SDSM is slightly underestimated for the wet and dry spell lengths. Although both models do not provide identical results, the time series generated by both methods indicate a general increasing trend in the mean daily temperature values. Meanwhile, the trend of the daily rainfall is not similar to each other, with SDSM giving a relatively higher change of annual rainfall compared to LARS-WG.

  5. Toward statistical modeling of saccadic eye-movement and visual saliency.

    PubMed

    Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

    2014-11-01

    In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.

  6. Validation of two (parametric vs non-parametric) daily weather generators

    NASA Astrophysics Data System (ADS)

    Dubrovsky, M.; Skalak, P.

    2015-12-01

    As the climate models (GCMs and RCMs) fail to satisfactorily reproduce the real-world surface weather regime, various statistical methods are applied to downscale GCM/RCM outputs into site-specific weather series. The stochastic weather generators are among the most favourite downscaling methods capable to produce realistic (observed-like) meteorological inputs for agrological, hydrological and other impact models used in assessing sensitivity of various ecosystems to climate change/variability. To name their advantages, the generators may (i) produce arbitrarily long multi-variate synthetic weather series representing both present and changed climates (in the latter case, the generators are commonly modified by GCM/RCM-based climate change scenarios), (ii) be run in various time steps and for multiple weather variables (the generators reproduce the correlations among variables), (iii) be interpolated (and run also for sites where no weather data are available to calibrate the generator). This contribution will compare two stochastic daily weather generators in terms of their ability to reproduce various features of the daily weather series. M&Rfi is a parametric generator: Markov chain model is used to model precipitation occurrence, precipitation amount is modelled by the Gamma distribution, and the 1st order autoregressive model is used to generate non-precipitation surface weather variables. The non-parametric GoMeZ generator is based on the nearest neighbours resampling technique making no assumption on the distribution of the variables being generated. Various settings of both weather generators will be assumed in the present validation tests. The generators will be validated in terms of (a) extreme temperature and precipitation characteristics (annual and 30-years extremes and maxima of duration of hot/cold/dry/wet spells); (b) selected validation statistics developed within the frame of VALUE project. The tests will be based on observational weather series from several European stations available from the ECA&D database. Acknowledgements: The weather generator is developed and validated within the frame of projects WG4VALUE (sponsored by the Ministry of Education, Youth and Sports of CR), and VALUE (COST ES 1102 action).

  7. Modeling Cell Size Regulation: From Single-Cell-Level Statistics to Molecular Mechanisms and Population-Level Effects.

    PubMed

    Ho, Po-Yi; Lin, Jie; Amir, Ariel

    2018-05-20

    Most microorganisms regulate their cell size. In this article, we review some of the mathematical formulations of the problem of cell size regulation. We focus on coarse-grained stochastic models and the statistics that they generate. We review the biologically relevant insights obtained from these models. We then describe cell cycle regulation and its molecular implementations, protein number regulation, and population growth, all in relation to size regulation. Finally, we discuss several future directions for developing understanding beyond phenomenological models of cell size regulation.

  8. Specious causal attributions in the social sciences: the reformulated stepping-stone theory of heroin use as exemplar.

    PubMed

    Baumrind, D

    1983-12-01

    The claims based on causal models employing either statistical or experimental controls are examined and found to be excessive when applied to social or behavioral science data. An exemplary case, in which strong causal claims are made on the basis of a weak version of the regularity model of cause, is critiqued. O'Donnell and Clayton claim that in order to establish that marijuana use is a cause of heroin use (their "reformulated stepping-stone" hypothesis), it is necessary and sufficient to demonstrate that marijuana use precedes heroin use and that the statistically significant association between the two does not vanish when the effects of other variables deemed to be prior to both of them are removed. I argue that O'Donnell and Clayton's version of the regularity model is not sufficient to establish cause and that the planning of social interventions both presumes and requires a generative rather than a regularity causal model. Causal modeling using statistical controls is of value when it compels the investigator to make explicit and to justify a causal explanation but not when it is offered as a substitute for a generative analysis of causal connection.

  9. Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

    NASA Astrophysics Data System (ADS)

    Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

    2017-07-01

    Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.

  10. Statistical metrology—measurement and modeling of variation for advanced process development and design rule generation

    NASA Astrophysics Data System (ADS)

    Boning, Duane S.; Chung, James E.

    1998-11-01

    Advanced process technology will require more detailed understanding and tighter control of variation in devices and interconnects. The purpose of statistical metrology is to provide methods to measure and characterize variation, to model systematic and random components of that variation, and to understand the impact of variation on both yield and performance of advanced circuits. Of particular concern are spatial or pattern-dependencies within individual chips; such systematic variation within the chip can have a much larger impact on performance than wafer-level random variation. Statistical metrology methods will play an important role in the creation of design rules for advanced technologies. For example, a key issue in multilayer interconnect is the uniformity of interlevel dielectric (ILD) thickness within the chip. For the case of ILD thickness, we describe phases of statistical metrology development and application to understanding and modeling thickness variation arising from chemical-mechanical polishing (CMP). These phases include screening experiments including design of test structures and test masks to gather electrical or optical data, techniques for statistical decomposition and analysis of the data, and approaches to calibrating empirical and physical variation models. These models can be integrated with circuit CAD tools to evaluate different process integration or design rule strategies. One focus for the generation of interconnect design rules are guidelines for the use of "dummy fill" or "metal fill" to improve the uniformity of underlying metal density and thus improve the uniformity of oxide thickness within the die. Trade-offs that can be evaluated via statistical metrology include the improvements to uniformity possible versus the effect of increased capacitance due to additional metal.

  11. Tropical geometry of statistical models.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.

  12. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  13. A Geostatistical Scaling Approach for the Generation of Non Gaussian Random Variables and Increments

    NASA Astrophysics Data System (ADS)

    Guadagnini, Alberto; Neuman, Shlomo P.; Riva, Monica; Panzeri, Marco

    2016-04-01

    We address manifestations of non-Gaussian statistical scaling displayed by many variables, Y, and their (spatial or temporal) increments. Evidence of such behavior includes symmetry of increment distributions at all separation distances (or lags) with sharp peaks and heavy tails which tend to decay asymptotically as lag increases. Variables reported to exhibit such distributions include quantities of direct relevance to hydrogeological sciences, e.g. porosity, log permeability, electrical resistivity, soil and sediment texture, sediment transport rate, rainfall, measured and simulated turbulent fluid velocity, and other. No model known to us captures all of the documented statistical scaling behaviors in a unique and consistent manner. We recently proposed a generalized sub-Gaussian model (GSG) which reconciles within a unique theoretical framework the probability distributions of a target variable and its increments. We presented an algorithm to generate unconditional random realizations of statistically isotropic or anisotropic GSG functions and illustrated it in two dimensions. In this context, we demonstrated the feasibility of estimating all key parameters of a GSG model underlying a single realization of Y by analyzing jointly spatial moments of Y data and corresponding increments. Here, we extend our GSG model to account for noisy measurements of Y at a discrete set of points in space (or time), present an algorithm to generate conditional realizations of corresponding isotropic or anisotropic random field, and explore them on one- and two-dimensional synthetic test cases.

  14. Statistical Evaluation of Utilization of the ISS

    NASA Technical Reports Server (NTRS)

    Andrews, Ross; Andrews, Alida

    2006-01-01

    PayLoad Utilization Modeler (PLUM) is a statistical-modeling computer program used to evaluate the effectiveness of utilization of the International Space Station (ISS) in terms of the number of research facilities that can be operated within a specified interval of time. PLUM is designed to balance the requirements of research facilities aboard the ISS against the resources available on the ISS. PLUM comprises three parts: an interface for the entry of data on constraints and on required and available resources, a database that stores these data as well as the program output, and a modeler. The modeler comprises two subparts: one that generates tens of thousands of random combinations of research facilities and another that calculates the usage of resources for each of those combinations. The results of these calculations are used to generate graphical and tabular reports to determine which facilities are most likely to be operable on the ISS, to identify which ISS resources are inadequate to satisfy the demands upon them, and to generate other data useful in allocation of and planning of resources.

  15. Design and Test of Pseudorandom Number Generator Using a Star Network of Lorenz Oscillators

    NASA Astrophysics Data System (ADS)

    Cho, Kenichiro; Miyano, Takaya

    We have recently developed a chaos-based stream cipher based on augmented Lorenz equations as a star network of Lorenz subsystems. In our method, the augmented Lorenz equations are used as a pseudorandom number generator. In this study, we propose a new method based on the augmented Lorenz equations for generating binary pseudorandom numbers and evaluate its security using the statistical tests of SP800-22 published by the National Institute for Standards and Technology in comparison with the performances of other chaotic dynamical models used as binary pseudorandom number generators. We further propose a faster version of the proposed method and evaluate its security using the statistical tests of TestU01 published by L’Ecuyer and Simard.

  16. A Comparison of Forecast Error Generators for Modeling Wind and Load Uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Ning; Diao, Ruisheng; Hafen, Ryan P.

    2013-12-18

    This paper presents four algorithms to generate random forecast error time series, including a truncated-normal distribution model, a state-space based Markov model, a seasonal autoregressive moving average (ARMA) model, and a stochastic-optimization based model. The error time series are used to create real-time (RT), hour-ahead (HA), and day-ahead (DA) wind and load forecast time series that statistically match historically observed forecasting data sets, used for variable generation integration studies. A comparison is made using historical DA load forecast and actual load values to generate new sets of DA forecasts with similar stoical forecast error characteristics. This paper discusses and comparesmore » the capabilities of each algorithm to preserve the characteristics of the historical forecast data sets.« less

  17. Statistical Compression for Climate Model Output

    NASA Astrophysics Data System (ADS)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  18. Einstein-Podolsky-Rosen correlations and Bell correlations in the simplest scenario

    NASA Astrophysics Data System (ADS)

    Quan, Quan; Zhu, Huangjun; Fan, Heng; Yang, Wen-Li

    2017-06-01

    Einstein-Podolsky-Rosen (EPR) steering is an intermediate type of quantum nonlocality which sits between entanglement and Bell nonlocality. A set of correlations is Bell nonlocal if it does not admit a local hidden variable (LHV) model, while it is EPR nonlocal if it does not admit a local hidden variable-local hidden state (LHV-LHS) model. It is interesting to know what states can generate EPR-nonlocal correlations in the simplest nontrivial scenario, that is, two projective measurements for each party sharing a two-qubit state. Here we show that a two-qubit state can generate EPR-nonlocal full correlations (excluding marginal statistics) in this scenario if and only if it can generate Bell-nonlocal correlations. If full statistics (including marginal statistics) is taken into account, surprisingly, the same scenario can manifest the simplest one-way steering and the strongest hierarchy between steering and Bell nonlocality. To illustrate these intriguing phenomena in simple setups, several concrete examples are discussed in detail, which facilitates experimental demonstration. In the course of study, we introduce the concept of restricted LHS models and thereby derive a necessary and sufficient semidefinite-programming criterion to determine the steerability of any bipartite state under given measurements. Analytical criteria are further derived in several scenarios of strong theoretical and experimental interest.

  19. Simulation of an ensemble of future climate time series with an hourly weather generator

    NASA Astrophysics Data System (ADS)

    Caporali, E.; Fatichi, S.; Ivanov, V. Y.; Kim, J.

    2010-12-01

    There is evidence that climate change is occurring in many regions of the world. The necessity of climate change predictions at the local scale and fine temporal resolution is thus warranted for hydrological, ecological, geomorphological, and agricultural applications that can provide thematic insights into the corresponding impacts. Numerous downscaling techniques have been proposed to bridge the gap between the spatial scales adopted in General Circulation Models (GCM) and regional analyses. Nevertheless, the time and spatial resolutions obtained as well as the type of meteorological variables may not be sufficient for detailed studies of climate change effects at the local scales. In this context, this study presents a stochastic downscaling technique that makes use of an hourly weather generator to simulate time series of predicted future climate. Using a Bayesian approach, the downscaling procedure derives distributions of factors of change for several climate statistics from a multi-model ensemble of GCMs. Factors of change are sampled from their distributions using a Monte Carlo technique to entirely account for the probabilistic information obtained with the Bayesian multi-model ensemble. Factors of change are subsequently applied to the statistics derived from observations to re-evaluate the parameters of the weather generator. The weather generator can reproduce a wide set of climate variables and statistics over a range of temporal scales, from extremes, to the low-frequency inter-annual variability. The final result of such a procedure is the generation of an ensemble of hourly time series of meteorological variables that can be considered as representative of future climate, as inferred from GCMs. The generated ensemble of scenarios also accounts for the uncertainty derived from multiple GCMs used in downscaling. Applications of the procedure in reproducing present and future climates are presented for different locations world-wide: Tucson (AZ), Detroit (MI), and Firenze (Italy). The stochastic downscaling is carried out with eight GCMs from the CMIP3 multi-model dataset (IPCC 4AR, A1B scenario).

  20. Adversarial Threshold Neural Computer for Molecular de Novo Design.

    PubMed

    Putin, Evgeny; Asadulaev, Arip; Vanhaelen, Quentin; Ivanenkov, Yan; Aladinskaya, Anastasia V; Aliper, Alex; Zhavoronkov, Alex

    2018-03-30

    In this article, we propose the deep neural network Adversarial Threshold Neural Computer (ATNC). The ATNC model is intended for the de novo design of novel small-molecule organic structures. The model is based on generative adversarial network architecture and reinforcement learning. ATNC uses a Differentiable Neural Computer as a generator and has a new specific block, called adversarial threshold (AT). AT acts as a filter between the agent (generator) and the environment (discriminator + objective reward functions). Furthermore, to generate more diverse molecules we introduce a new objective reward function named Internal Diversity Clustering (IDC). In this work, ATNC is tested and compared with the ORGANIC model. Both models were trained on the SMILES string representation of the molecules, using four objective functions (internal similarity, Muegge druglikeness filter, presence or absence of sp 3 -rich fragments, and IDC). The SMILES representations of 15K druglike molecules from the ChemDiv collection were used as a training data set. For the different functions, ATNC outperforms ORGANIC. Combined with the IDC, ATNC generates 72% of valid and 77% of unique SMILES strings, while ORGANIC generates only 7% of valid and 86% of unique SMILES strings. For each set of molecules generated by ATNC and ORGANIC, we analyzed distributions of four molecular descriptors (number of atoms, molecular weight, logP, and tpsa) and calculated five chemical statistical features (internal diversity, number of unique heterocycles, number of clusters, number of singletons, and number of compounds that have not been passed through medicinal chemistry filters). Analysis of key molecular descriptors and chemical statistical features demonstrated that the molecules generated by ATNC elicited better druglikeness properties. We also performed in vitro validation of the molecules generated by ATNC; results indicated that ATNC is an effective method for producing hit compounds.

  1. Development of a statistical model for cervical cancer cell death with irreversible electroporation in vitro.

    PubMed

    Yang, Yongji; Moser, Michael A J; Zhang, Edwin; Zhang, Wenjun; Zhang, Bing

    2018-01-01

    The aim of this study was to develop a statistical model for cell death by irreversible electroporation (IRE) and to show that the statistic model is more accurate than the electric field threshold model in the literature using cervical cancer cells in vitro. HeLa cell line was cultured and treated with different IRE protocols in order to obtain data for modeling the statistical relationship between the cell death and pulse-setting parameters. In total, 340 in vitro experiments were performed with a commercial IRE pulse system, including a pulse generator and an electric cuvette. Trypan blue staining technique was used to evaluate cell death after 4 hours of incubation following IRE treatment. Peleg-Fermi model was used in the study to build the statistical relationship using the cell viability data obtained from the in vitro experiments. A finite element model of IRE for the electric field distribution was also built. Comparison of ablation zones between the statistical model and electric threshold model (drawn from the finite element model) was used to show the accuracy of the proposed statistical model in the description of the ablation zone and its applicability in different pulse-setting parameters. The statistical models describing the relationships between HeLa cell death and pulse length and the number of pulses, respectively, were built. The values of the curve fitting parameters were obtained using the Peleg-Fermi model for the treatment of cervical cancer with IRE. The difference in the ablation zone between the statistical model and the electric threshold model was also illustrated to show the accuracy of the proposed statistical model in the representation of ablation zone in IRE. This study concluded that: (1) the proposed statistical model accurately described the ablation zone of IRE with cervical cancer cells, and was more accurate compared with the electric field model; (2) the proposed statistical model was able to estimate the value of electric field threshold for the computer simulation of IRE in the treatment of cervical cancer; and (3) the proposed statistical model was able to express the change in ablation zone with the change in pulse-setting parameters.

  2. The discrimination of sea ice types using SAR backscatter statistics

    NASA Technical Reports Server (NTRS)

    Shuchman, Robert A.; Wackerman, Christopher C.; Maffett, Andrew L.; Onstott, Robert G.; Sutherland, Laura L.

    1989-01-01

    X-band (HH) synthetic aperture radar (SAR) data of sea ice collected during the Marginal Ice Zone Experiment in March and April of 1987 was statistically analyzed with respect to discriminating open water, first-year ice, multiyear ice, and Odden. Odden are large expanses of nilas ice that rapidly form in the Greenland Sea and transform into pancake ice. A first-order statistical analysis indicated that mean versus variance can segment out open water and first-year ice, and skewness versus modified skewness can segment the Odden and multilayer categories. In additions to first-order statistics, a model has been generated for the distribution function of the SAR ice data. Segmentation of ice types was also attempted using textural measurements. In this case, the general co-occurency matrix was evaluated. The textural method did not generate better results than the first-order statistical approach.

  3. Application of a Fuzzy Verification Technique for Assessment of the Weather Running Estimate-Nowcast (WRE-N) Model

    DTIC Science & Technology

    2016-10-01

    comes when considering numerous scores and statistics during a preliminary evaluation of the applicability of the fuzzy- verification minimum coverage...The selection of thresholds with which to generate categorical-verification scores and statistics from the application of both traditional and...of statistically significant numbers of cases; the latter presents a challenge of limited application for assessment of the forecast models’ ability

  4. Random generation of the turbulence slopes of a Shack-Hartmann wavefront sensor.

    PubMed

    Conan, Rodolphe

    2014-03-15

    A method to generate the turbulence measurements of a Shack-Hartmann wavefront sensor is presented. Numerical simulations demonstrate that the spatial and temporal statistic properties of the slopes are respected, allowing us to generate the turbulence wavefront gradient corresponding to both natural and laser guide stars, as well as time series in accordance with the frozen flow model.

  5. A methodology for the stochastic generation of hourly synthetic direct normal irradiation time series

    NASA Astrophysics Data System (ADS)

    Larrañeta, M.; Moreno-Tejera, S.; Lillo-Bravo, I.; Silva-Pérez, M. A.

    2018-02-01

    Many of the available solar radiation databases only provide global horizontal irradiance (GHI) while there is a growing need of extensive databases of direct normal radiation (DNI) mainly for the development of concentrated solar power and concentrated photovoltaic technologies. In the present work, we propose a methodology for the generation of synthetic DNI hourly data from the hourly average GHI values by dividing the irradiance into a deterministic and stochastic component intending to emulate the dynamics of the solar radiation. The deterministic component is modeled through a simple classical model. The stochastic component is fitted to measured data in order to maintain the consistency of the synthetic data with the state of the sky, generating statistically significant DNI data with a cumulative frequency distribution very similar to the measured data. The adaptation and application of the model to the location of Seville shows significant improvements in terms of frequency distribution over the classical models. The proposed methodology applied to other locations with different climatological characteristics better results than the classical models in terms of frequency distribution reaching a reduction of the 50% in the Finkelstein-Schafer (FS) and Kolmogorov-Smirnov test integral (KSI) statistics.

  6. Statistical characteristic in time-domain of direct current corona-generated audible noise from conductor in corona cage

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Xuebao, E-mail: lxb08357x@ncepu.edu.cn; Cui, Xiang, E-mail: x.cui@ncepu.edu.cn; Ma, Wenzuo

    The corona-generated audible noise (AN) has become one of decisive factors in the design of high voltage direct current (HVDC) transmission lines. The AN from transmission lines can be attributed to sound pressure pulses which are generated by the multiple corona sources formed on the conductor, i.e., transmission lines. In this paper, a detailed time-domain characteristics of the sound pressure pulses, which are generated by the DC corona discharges formed over the surfaces of a stranded conductors, are investigated systematically in a laboratory settings using a corona cage structure. The amplitude of sound pressure pulse and its time intervals aremore » extracted by observing a direct correlation between corona current pulses and corona-generated sound pressure pulses. Based on the statistical characteristics, a stochastic model is presented for simulating the sound pressure pulses due to DC corona discharges occurring on conductors. The proposed stochastic model is validated by comparing the calculated and measured A-weighted sound pressure level (SPL). The proposed model is then used to analyze the influence of the pulse amplitudes and pulse rate on the SPL. Furthermore, a mathematical relationship is found between the SPL and conductor diameter, electric field, and radial distance.« less

  7. Machine Learning Predictions of a Multiresolution Climate Model Ensemble

    NASA Astrophysics Data System (ADS)

    Anderson, Gemma J.; Lucas, Donald D.

    2018-05-01

    Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.

  8. A Statistical Multimodel Ensemble Approach to Improving Long-Range Forecasting in Pakistan

    DTIC Science & Technology

    2012-03-01

    Impact of global warming on monsoon variability in Pakistan. J. Anim. Pl. Sci., 21, no. 1, 107–110. Gillies, S., T. Murphree, and D. Meyer, 2012...are generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The...generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The predictands are

  9. Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

    DTIC Science & Technology

    2009-01-01

    88 4 Monolingually -Derived Phrasal Paraphrase Generation for Statistical Ma- chine Translation 90 4.1...123 4.4 Spanish-English (S2E) results . . . . . . . . . . . . . . . . . . . . . . 125 4.5 Gains from using larger monolingual corpora for...96 4.2 Visual example of a phrasal distributional profile . . . . . . . . . . . . 103 4.3 Monolingual corpus-based distributional

  10. Statistical physics of interacting neural networks

    NASA Astrophysics Data System (ADS)

    Kinzel, Wolfgang; Metzler, Richard; Kanter, Ido

    2001-12-01

    Recent results on the statistical physics of time series generation and prediction are presented. A neural network is trained on quasi-periodic and chaotic sequences and overlaps to the sequence generator as well as the prediction errors are calculated numerically. For each network there exists a sequence for which it completely fails to make predictions. Two interacting networks show a transition to perfect synchronization. A pool of interacting networks shows good coordination in the minority game-a model of competition in a closed market. Finally, as a demonstration, a perceptron predicts bit sequences produced by human beings.

  11. Dynamic Statistical Models for Pyroclastic Density Current Generation at Soufrière Hills Volcano

    NASA Astrophysics Data System (ADS)

    Wolpert, Robert L.; Spiller, Elaine T.; Calder, Eliza S.

    2018-05-01

    To mitigate volcanic hazards from pyroclastic density currents, volcanologists generate hazard maps that provide long-term forecasts of areas of potential impact. Several recent efforts in the field develop new statistical methods for application of flow models to generate fully probabilistic hazard maps that both account for, and quantify, uncertainty. However a limitation to the use of most statistical hazard models, and a key source of uncertainty within them, is the time-averaged nature of the datasets by which the volcanic activity is statistically characterized. Where the level, or directionality, of volcanic activity frequently changes, e.g. during protracted eruptive episodes, or at volcanoes that are classified as persistently active, it is not appropriate to make short term forecasts based on longer time-averaged metrics of the activity. Thus, here we build, fit and explore dynamic statistical models for the generation of pyroclastic density current from Soufrière Hills Volcano (SHV) on Montserrat including their respective collapse direction and flow volumes based on 1996-2008 flow datasets. The development of this approach allows for short-term behavioral changes to be taken into account in probabilistic volcanic hazard assessments. We show that collapses from the SHV lava dome follow a clear pattern, and that a series of smaller flows in a given direction often culminate in a larger collapse and thereafter directionality of the flows change. Such models enable short term forecasting (weeks to months) that can reflect evolving conditions such as dome and crater morphology changes and non-stationary eruptive behavior such as extrusion rate variations. For example, the probability of inundation of the Belham Valley in the first 180 days of a forecast period is about twice as high for lava domes facing Northwest toward that valley as it is for domes pointing East toward the Tar River Valley. As rich multi-parametric volcano monitoring dataset become increasingly available, eruption forecasting is becoming an increasingly viable and important research field. We demonstrate an approach to utilize such data in order to appropriately 'tune' probabilistic hazard assessments for pyroclastic flows. Our broader objective with development of this method is to help advance time-dependent volcanic hazard assessment, by bridging the

  12. Associative Algorithms for Computational Creativity

    ERIC Educational Resources Information Center

    Varshney, Lav R.; Wang, Jun; Varshney, Kush R.

    2016-01-01

    Computational creativity, the generation of new, unimagined ideas or artifacts by a machine that are deemed creative by people, can be applied in the culinary domain to create novel and flavorful dishes. In fact, we have done so successfully using a combinatorial algorithm for recipe generation combined with statistical models for recipe ranking…

  13. A Computer Program for the Generation of ARIMA Data

    ERIC Educational Resources Information Center

    Green, Samuel B.; Noles, Keith O.

    1977-01-01

    The autoregressive integrated moving averages model (ARIMA) has been applied to time series data in psychological and educational research. A program is described that generates ARIMA data of a known order. The program enables researchers to explore statistical properties of ARIMA data and simulate systems producing time dependent observations.…

  14. A high-fidelity weather time series generator using the Markov Chain process on a piecewise level

    NASA Astrophysics Data System (ADS)

    Hersvik, K.; Endrerud, O.-E. V.

    2017-12-01

    A method is developed for generating a set of unique weather time-series based on an existing weather series. The method allows statistically valid weather variations to take place within repeated simulations of offshore operations. The numerous generated time series need to share the same statistical qualities as the original time series. Statistical qualities here refer mainly to the distribution of weather windows available for work, including durations and frequencies of such weather windows, and seasonal characteristics. The method is based on the Markov chain process. The core new development lies in how the Markov Process is used, specifically by joining small pieces of random length time series together rather than joining individual weather states, each from a single time step, which is a common solution found in the literature. This new Markov model shows favorable characteristics with respect to the requirements set forth and all aspects of the validation performed.

  15. Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

    NASA Technical Reports Server (NTRS)

    Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

    1999-01-01

    We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

  16. Classical subjective expected utility.

    PubMed

    Cerreia-Vioglio, Simone; Maccheroni, Fabio; Marinacci, Massimo; Montrucchio, Luigi

    2013-04-23

    We consider decision makers who know that payoff-relevant observations are generated by a process that belongs to a given class M, as postulated in Wald [Wald A (1950) Statistical Decision Functions (Wiley, New York)]. We incorporate this Waldean piece of objective information within an otherwise subjective setting à la Savage [Savage LJ (1954) The Foundations of Statistics (Wiley, New York)] and show that this leads to a two-stage subjective expected utility model that accounts for both state and model uncertainty.

  17. A fragmentation model of earthquake-like behavior in internet access activity

    NASA Astrophysics Data System (ADS)

    Paguirigan, Antonino A.; Angco, Marc Jordan G.; Bantang, Johnrob Y.

    We present a fragmentation model that generates almost any inverse power-law size distribution, including dual-scaled versions, consistent with the underlying dynamics of systems with earthquake-like behavior. We apply the model to explain the dual-scaled power-law statistics observed in an Internet access dataset that covers more than 32 million requests. The non-Poissonian statistics of the requested data sizes m and the amount of time τ needed for complete processing are consistent with the Gutenberg-Richter-law. Inter-event times δt between subsequent requests are also shown to exhibit power-law distributions consistent with the generalized Omori law. Thus, the dataset is similar to the earthquake data except that two power-law regimes are observed. Using the proposed model, we are able to identify underlying dynamics responsible in generating the observed dual power-law distributions. The model is universal enough for its applicability to any physical and human dynamics that is limited by finite resources such as space, energy, time or opportunity.

  18. Statistical analysis of cascading failures in power grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chertkov, Michael; Pfitzner, Rene; Turitsyn, Konstantin

    2010-12-01

    We introduce a new microscopic model of cascading failures in transmission power grids. This model accounts for automatic response of the grid to load fluctuations that take place on the scale of minutes, when optimum power flow adjustments and load shedding controls are unavailable. We describe extreme events, caused by load fluctuations, which cause cascading failures of loads, generators and lines. Our model is quasi-static in the causal, discrete time and sequential resolution of individual failures. The model, in its simplest realization based on the Directed Current description of the power flow problem, is tested on three standard IEEE systemsmore » consisting of 30, 39 and 118 buses. Our statistical analysis suggests a straightforward classification of cascading and islanding phases in terms of the ratios between average number of removed loads, generators and links. The analysis also demonstrates sensitivity to variations in line capacities. Future research challenges in modeling and control of cascading outages over real-world power networks are discussed.« less

  19. Multilevel Analysis of Structural Equation Models via the EM Algorithm.

    ERIC Educational Resources Information Center

    Jo, See-Heyon

    The question of how to analyze unbalanced hierarchical data generated from structural equation models has been a common problem for researchers and analysts. Among difficulties plaguing statistical modeling are estimation bias due to measurement error and the estimation of the effects of the individual's hierarchical social milieu. This paper…

  20. Alterations in choice behavior by manipulations of world model.

    PubMed

    Green, C S; Benson, C; Kersten, D; Schrater, P

    2010-09-14

    How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) "probability matching"-a consistent example of suboptimal choice behavior seen in humans-occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.

  1. Alterations in choice behavior by manipulations of world model

    PubMed Central

    Green, C. S.; Benson, C.; Kersten, D.; Schrater, P.

    2010-01-01

    How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) “probability matching”—a consistent example of suboptimal choice behavior seen in humans—occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning. PMID:20805507

  2. CerebroMatic: A Versatile Toolbox for Spline-Based MRI Template Creation

    PubMed Central

    Wilke, Marko; Altaye, Mekibib; Holland, Scott K.

    2017-01-01

    Brain image spatial normalization and tissue segmentation rely on prior tissue probability maps. Appropriately selecting these tissue maps becomes particularly important when investigating “unusual” populations, such as young children or elderly subjects. When creating such priors, the disadvantage of applying more deformation must be weighed against the benefit of achieving a crisper image. We have previously suggested that statistically modeling demographic variables, instead of simply averaging images, is advantageous. Both aspects (more vs. less deformation and modeling vs. averaging) were explored here. We used imaging data from 1914 subjects, aged 13 months to 75 years, and employed multivariate adaptive regression splines to model the effects of age, field strength, gender, and data quality. Within the spm/cat12 framework, we compared an affine-only with a low- and a high-dimensional warping approach. As expected, more deformation on the individual level results in lower group dissimilarity. Consequently, effects of age in particular are less apparent in the resulting tissue maps when using a more extensive deformation scheme. Using statistically-described parameters, high-quality tissue probability maps could be generated for the whole age range; they are consistently closer to a gold standard than conventionally-generated priors based on 25, 50, or 100 subjects. Distinct effects of field strength, gender, and data quality were seen. We conclude that an extensive matching for generating tissue priors may model much of the variability inherent in the dataset which is then not contained in the resulting priors. Further, the statistical description of relevant parameters (using regression splines) allows for the generation of high-quality tissue probability maps while controlling for known confounds. The resulting CerebroMatic toolbox is available for download at http://irc.cchmc.org/software/cerebromatic.php. PMID:28275348

  3. CerebroMatic: A Versatile Toolbox for Spline-Based MRI Template Creation.

    PubMed

    Wilke, Marko; Altaye, Mekibib; Holland, Scott K

    2017-01-01

    Brain image spatial normalization and tissue segmentation rely on prior tissue probability maps. Appropriately selecting these tissue maps becomes particularly important when investigating "unusual" populations, such as young children or elderly subjects. When creating such priors, the disadvantage of applying more deformation must be weighed against the benefit of achieving a crisper image. We have previously suggested that statistically modeling demographic variables, instead of simply averaging images, is advantageous. Both aspects (more vs. less deformation and modeling vs. averaging) were explored here. We used imaging data from 1914 subjects, aged 13 months to 75 years, and employed multivariate adaptive regression splines to model the effects of age, field strength, gender, and data quality. Within the spm/cat12 framework, we compared an affine-only with a low- and a high-dimensional warping approach. As expected, more deformation on the individual level results in lower group dissimilarity. Consequently, effects of age in particular are less apparent in the resulting tissue maps when using a more extensive deformation scheme. Using statistically-described parameters, high-quality tissue probability maps could be generated for the whole age range; they are consistently closer to a gold standard than conventionally-generated priors based on 25, 50, or 100 subjects. Distinct effects of field strength, gender, and data quality were seen. We conclude that an extensive matching for generating tissue priors may model much of the variability inherent in the dataset which is then not contained in the resulting priors. Further, the statistical description of relevant parameters (using regression splines) allows for the generation of high-quality tissue probability maps while controlling for known confounds. The resulting CerebroMatic toolbox is available for download at http://irc.cchmc.org/software/cerebromatic.php.

  4. A Framework to Learn Physics from Atomically Resolved Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vlcek, L.; Maksov, A.; Pan, M.

    Here, we present a generalized framework for physics extraction, i.e., knowledge, from atomically resolved images, and show its utility by applying it to a model system of segregation of chalcogen atoms in an FeSe 0.45Te 0.55 superconductor system. We emphasize that the framework can be used for any imaging data for which a generative physical model exists. Consider that a generative physical model can produce a very large number of configurations, not all of which are observable. By applying a microscope function to a sub-set of this generated data, we form a simulated dataset on which statistics can be computed.

  5. Statistics of SU(5) D-brane models on a type II orientifold

    NASA Astrophysics Data System (ADS)

    Gmeiner, Florian; Stein, Maren

    2006-06-01

    We perform a statistical analysis of models with SU(5) and flipped SU(5) gauge group in a type II orientifold setup. We investigate the distribution and correlation of properties of these models, including the number of generations and the hidden sector gauge group. Compared to the recent analysis [F. Gmeiner, R. Blumenhagen, G. Honecker, D. Lüst, and T. Weigand, J. High Energy Phys.JHEPFG1029-8479 01 (2006) 004; F. Gmeiner, Fortschr. Phys.FPYKA60015-8208 54, 391 (2006).10.1088/1126-6708/2006/01/004] of models with a standard model-like gauge group, we find very similar results.

  6. Statistical complexity measure of pseudorandom bit generators

    NASA Astrophysics Data System (ADS)

    González, C. M.; Larrondo, H. A.; Rosso, O. A.

    2005-08-01

    Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.

  7. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

    PubMed Central

    2017-01-01

    In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery. PMID:29392184

  8. An empirical generative framework for computational modeling of language acquisition.

    PubMed

    Waterfall, Heidi R; Sandbank, Ben; Onnis, Luca; Edelman, Shimon

    2010-06-01

    This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of generative grammars from raw CHILDES data and give an account of the generative performance of the acquired grammars. Next, we summarize findings from recent longitudinal and experimental work that suggests how certain statistically prominent structural properties of child-directed speech may facilitate language acquisition. We then present a series of new analyses of CHILDES data indicating that the desired properties are indeed present in realistic child-directed speech corpora. Finally, we suggest how our computational results, behavioral findings, and corpus-based insights can be integrated into a next-generation model aimed at meeting the four requirements of our modeling framework.

  9. Assessment of a stochastic downscaling methodology in generating an ensemble of hourly future climate time series

    NASA Astrophysics Data System (ADS)

    Fatichi, S.; Ivanov, V. Y.; Caporali, E.

    2013-04-01

    This study extends a stochastic downscaling methodology to generation of an ensemble of hourly time series of meteorological variables that express possible future climate conditions at a point-scale. The stochastic downscaling uses general circulation model (GCM) realizations and an hourly weather generator, the Advanced WEather GENerator (AWE-GEN). Marginal distributions of factors of change are computed for several climate statistics using a Bayesian methodology that can weight GCM realizations based on the model relative performance with respect to a historical climate and a degree of disagreement in projecting future conditions. A Monte Carlo technique is used to sample the factors of change from their respective marginal distributions. As a comparison with traditional approaches, factors of change are also estimated by averaging GCM realizations. With either approach, the derived factors of change are applied to the climate statistics inferred from historical observations to re-evaluate parameters of the weather generator. The re-parameterized generator yields hourly time series of meteorological variables that can be considered to be representative of future climate conditions. In this study, the time series are generated in an ensemble mode to fully reflect the uncertainty of GCM projections, climate stochasticity, as well as uncertainties of the downscaling procedure. Applications of the methodology in reproducing future climate conditions for the periods of 2000-2009, 2046-2065 and 2081-2100, using the period of 1962-1992 as the historical baseline are discussed for the location of Firenze (Italy). The inferences of the methodology for the period of 2000-2009 are tested against observations to assess reliability of the stochastic downscaling procedure in reproducing statistics of meteorological variables at different time scales.

  10. High-resolution stochastic generation of extreme rainfall intensity for urban drainage modelling applications

    NASA Astrophysics Data System (ADS)

    Peleg, Nadav; Blumensaat, Frank; Molnar, Peter; Fatichi, Simone; Burlando, Paolo

    2016-04-01

    Urban drainage response is highly dependent on the spatial and temporal structure of rainfall. Therefore, measuring and simulating rainfall at a high spatial and temporal resolution is a fundamental step to fully assess urban drainage system reliability and related uncertainties. This is even more relevant when considering extreme rainfall events. However, the current space-time rainfall models have limitations in capturing extreme rainfall intensity statistics for short durations. Here, we use the STREAP (Space-Time Realizations of Areal Precipitation) model, which is a novel stochastic rainfall generator for simulating high-resolution rainfall fields that preserve the spatio-temporal structure of rainfall and its statistical characteristics. The model enables a generation of rain fields at 102 m and minute scales in a fast and computer-efficient way matching the requirements for hydrological analysis of urban drainage systems. The STREAP model was applied successfully in the past to generate high-resolution extreme rainfall intensities over a small domain. A sub-catchment in the city of Luzern (Switzerland) was chosen as a case study to: (i) evaluate the ability of STREAP to disaggregate extreme rainfall intensities for urban drainage applications; (ii) assessing the role of stochastic climate variability of rainfall in flow response and (iii) evaluate the degree of non-linearity between extreme rainfall intensity and system response (i.e. flow) for a small urban catchment. The channel flow at the catchment outlet is simulated by means of a calibrated hydrodynamic sewer model.

  11. “Plateau”-related summary statistics are uninformative for comparing working memory models

    PubMed Central

    van den Berg, Ronald; Ma, Wei Ji

    2014-01-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235

  12. Discriminative Random Field Models for Subsurface Contamination Uncertainty Quantification

    NASA Astrophysics Data System (ADS)

    Arshadi, M.; Abriola, L. M.; Miller, E. L.; De Paolis Kaluza, C.

    2017-12-01

    Application of flow and transport simulators for prediction of the release, entrapment, and persistence of dense non-aqueous phase liquids (DNAPLs) and associated contaminant plumes is a computationally intensive process that requires specification of a large number of material properties and hydrologic/chemical parameters. Given its computational burden, this direct simulation approach is particularly ill-suited for quantifying both the expected performance and uncertainty associated with candidate remediation strategies under real field conditions. Prediction uncertainties primarily arise from limited information about contaminant mass distributions, as well as the spatial distribution of subsurface hydrologic properties. Application of direct simulation to quantify uncertainty would, thus, typically require simulating multiphase flow and transport for a large number of permeability and release scenarios to collect statistics associated with remedial effectiveness, a computationally prohibitive process. The primary objective of this work is to develop and demonstrate a methodology that employs measured field data to produce equi-probable stochastic representations of a subsurface source zone that capture the spatial distribution and uncertainty associated with key features that control remediation performance (i.e., permeability and contamination mass). Here we employ probabilistic models known as discriminative random fields (DRFs) to synthesize stochastic realizations of initial mass distributions consistent with known, and typically limited, site characterization data. Using a limited number of full scale simulations as training data, a statistical model is developed for predicting the distribution of contaminant mass (e.g., DNAPL saturation and aqueous concentration) across a heterogeneous domain. Monte-Carlo sampling methods are then employed, in conjunction with the trained statistical model, to generate realizations conditioned on measured borehole data. Performance of the statistical model is illustrated through comparisons of generated realizations with the `true' numerical simulations. Finally, we demonstrate how these realizations can be used to determine statistically optimal locations for further interrogation of the subsurface.

  13. Inferring general relations between network characteristics from specific network ensembles.

    PubMed

    Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan

    2012-01-01

    Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

  14. Theory and generation of conditional, scalable sub-Gaussian random fields

    NASA Astrophysics Data System (ADS)

    Panzeri, M.; Riva, M.; Guadagnini, A.; Neuman, S. P.

    2016-03-01

    Many earth and environmental (as well as a host of other) variables, Y, and their spatial (or temporal) increments, ΔY, exhibit non-Gaussian statistical scaling. Previously we were able to capture key aspects of such non-Gaussian scaling by treating Y and/or ΔY as sub-Gaussian random fields (or processes). This however left unaddressed the empirical finding that whereas sample frequency distributions of Y tend to display relatively mild non-Gaussian peaks and tails, those of ΔY often reveal peaks that grow sharper and tails that become heavier with decreasing separation distance or lag. Recently we proposed a generalized sub-Gaussian model (GSG) which resolves this apparent inconsistency between the statistical scaling behaviors of observed variables and their increments. We presented an algorithm to generate unconditional random realizations of statistically isotropic or anisotropic GSG functions and illustrated it in two dimensions. Most importantly, we demonstrated the feasibility of estimating all parameters of a GSG model underlying a single realization of Y by analyzing jointly spatial moments of Y data and corresponding increments, ΔY. Here, we extend our GSG model to account for noisy measurements of Y at a discrete set of points in space (or time), present an algorithm to generate conditional realizations of corresponding isotropic or anisotropic random fields, introduce two approximate versions of this algorithm to reduce CPU time, and explore them on one and two-dimensional synthetic test cases.

  15. The importance of topographically corrected null models for analyzing ecological point processes.

    PubMed

    McDowall, Philip; Lynch, Heather J

    2017-07-01

    Analyses of point process patterns and related techniques (e.g., MaxEnt) make use of the expected number of occurrences per unit area and second-order statistics based on the distance between occurrences. Ecologists working with point process data often assume that points exist on a two-dimensional x-y plane or within a three-dimensional volume, when in fact many observed point patterns are generated on a two-dimensional surface existing within three-dimensional space. For many surfaces, however, such as the topography of landscapes, the projection from the surface to the x-y plane preserves neither area nor distance. As such, when these point patterns are implicitly projected to and analyzed in the x-y plane, our expectations of the point pattern's statistical properties may not be met. When used in hypothesis testing, we find that the failure to account for the topography of the generating surface may bias statistical tests that incorrectly identify clustering and, furthermore, may bias coefficients in inhomogeneous point process models that incorporate slope as a covariate. We demonstrate the circumstances under which this bias is significant, and present simple methods that allow point processes to be simulated with corrections for topography. These point patterns can then be used to generate "topographically corrected" null models against which observed point processes can be compared. © 2017 by the Ecological Society of America.

  16. Combining Statistics and Physics to Improve Climate Downscaling

    NASA Astrophysics Data System (ADS)

    Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

    2017-12-01

    Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.

  17. Properties of JP=1/2+ baryon octets at low energy

    NASA Astrophysics Data System (ADS)

    Kaur, Amanpreet; Gupta, Pallavi; Upadhyay, Alka

    2017-06-01

    The statistical model in combination with the detailed balance principle is able to phenomenologically calculate and analyze spin- and flavor-dependent properties like magnetic moments (with effective masses, with effective charge, or with both effective mass and effective charge), quark spin polarization and distribution, the strangeness suppression factor, and \\overline{d}-\\overline{u} asymmetry incorporating the strange sea. The s\\overline{s} in the sea is said to be generated via the basic quark mechanism but suppressed by the strange quark mass factor ms>m_{u,d}. The magnetic moments of the octet baryons are analyzed within the statistical model, by putting emphasis on the SU(3) symmetry-breaking effects generated by the mass difference between the strange and non-strange quarks. The work presented here assumes hadrons with a sea having an admixture of quark gluon Fock states. The results obtained have been compared with theoretical models and experimental data.

  18. Developing models for the prediction of hospital healthcare waste generation rate.

    PubMed

    Tesfahun, Esubalew; Kumie, Abera; Beyene, Abebe

    2016-01-01

    An increase in the number of health institutions, along with frequent use of disposable medical products, has contributed to the increase of healthcare waste generation rate. For proper handling of healthcare waste, it is crucial to predict the amount of waste generation beforehand. Predictive models can help to optimise healthcare waste management systems, set guidelines and evaluate the prevailing strategies for healthcare waste handling and disposal. However, there is no mathematical model developed for Ethiopian hospitals to predict healthcare waste generation rate. Therefore, the objective of this research was to develop models for the prediction of a healthcare waste generation rate. A longitudinal study design was used to generate long-term data on solid healthcare waste composition, generation rate and develop predictive models. The results revealed that the healthcare waste generation rate has a strong linear correlation with the number of inpatients (R(2) = 0.965), and a weak one with the number of outpatients (R(2) = 0.424). Statistical analysis was carried out to develop models for the prediction of the quantity of waste generated at each hospital (public, teaching and private). In these models, the number of inpatients and outpatients were revealed to be significant factors on the quantity of waste generated. The influence of the number of inpatients and outpatients treated varies at different hospitals. Therefore, different models were developed based on the types of hospitals. © The Author(s) 2015.

  19. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar ) 5830 University Research Court College Park, MD 20740 Page Author: EMC Webmaster Page generated:Sunday

  20. Statistical model for speckle pattern optimization.

    PubMed

    Su, Yong; Zhang, Qingchuan; Gao, Zeren

    2017-11-27

    Image registration is the key technique of optical metrologies such as digital image correlation (DIC), particle image velocimetry (PIV), and speckle metrology. Its performance depends critically on the quality of image pattern, and thus pattern optimization attracts extensive attention. In this article, a statistical model is built to optimize speckle patterns that are composed of randomly positioned speckles. It is found that the process of speckle pattern generation is essentially a filtered Poisson process. The dependence of measurement errors (including systematic errors, random errors, and overall errors) upon speckle pattern generation parameters is characterized analytically. By minimizing the errors, formulas of the optimal speckle radius are presented. Although the primary motivation is from the field of DIC, we believed that scholars in other optical measurement communities, such as PIV and speckle metrology, will benefit from these discussions.

  1. Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

    PubMed

    Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi

    2016-05-01

    Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.

  2. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  3. Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis

    NASA Technical Reports Server (NTRS)

    Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)

    1979-01-01

    The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.

  4. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar University Research Court College Park, MD 20740 Page Author: EMC Webmaster Page generated:Sunday, 27-May

  5. "Plateau"-related summary statistics are uninformative for comparing working memory models.

    PubMed

    van den Berg, Ronald; Ma, Wei Ji

    2014-10-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.

  6. Artificial neural network study on organ-targeting peptides

    NASA Astrophysics Data System (ADS)

    Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

    2010-01-01

    We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.

  7. Nonclassical point of view of the Brownian motion generation via fractional deterministic model

    NASA Astrophysics Data System (ADS)

    Gilardi-Velázquez, H. E.; Campos-Cantón, E.

    In this paper, we present a dynamical system based on the Langevin equation without stochastic term and using fractional derivatives that exhibit properties of Brownian motion, i.e. a deterministic model to generate Brownian motion is proposed. The stochastic process is replaced by considering an additional degree of freedom in the second-order Langevin equation. Thus, it is transformed into a system of three first-order linear differential equations, additionally α-fractional derivative are considered which allow us to obtain better statistical properties. Switching surfaces are established as a part of fluctuating acceleration. The final system of three α-order linear differential equations does not contain a stochastic term, so the system generates motion in a deterministic way. Nevertheless, from the time series analysis, we found that the behavior of the system exhibits statistics properties of Brownian motion, such as, a linear growth in time of mean square displacement, a Gaussian distribution. Furthermore, we use the detrended fluctuation analysis to prove the Brownian character of this motion.

  8. Multi-site precipitation downscaling using a stochastic weather generator

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Chen, Hua; Guo, Shenglian

    2018-03-01

    Statistical downscaling is an efficient way to solve the spatiotemporal mismatch between climate model outputs and the data requirements of hydrological models. However, the most commonly-used downscaling method only produces climate change scenarios for a specific site or watershed average, which is unable to drive distributed hydrological models to study the spatial variability of climate change impacts. By coupling a single-site downscaling method and a multi-site weather generator, this study proposes a multi-site downscaling approach for hydrological climate change impact studies. Multi-site downscaling is done in two stages. The first stage involves spatially downscaling climate model-simulated monthly precipitation from grid scale to a specific site using a quantile mapping method, and the second stage involves the temporal disaggregating of monthly precipitation to daily values by adjusting the parameters of a multi-site weather generator. The inter-station correlation is specifically considered using a distribution-free approach along with an iterative algorithm. The performance of the downscaling approach is illustrated using a 10-station watershed as an example. The precipitation time series derived from the National Centers for Environment Prediction (NCEP) reanalysis dataset is used as the climate model simulation. The precipitation time series of each station is divided into 30 odd years for calibration and 29 even years for validation. Several metrics, including the frequencies of wet and dry spells and statistics of the daily, monthly and annual precipitation are used as criteria to evaluate the multi-site downscaling approach. The results show that the frequencies of wet and dry spells are well reproduced for all stations. In addition, the multi-site downscaling approach performs well with respect to reproducing precipitation statistics, especially at monthly and annual timescales. The remaining biases mainly result from the non-stationarity of NCEP precipitation. Overall, the proposed approach is efficient for generating multi-site climate change scenarios that can be used to investigate the spatial variability of climate change impacts on hydrology.

  9. Fundamental statistical relationships between monthly and daily meteorological variables: Temporal downscaling of weather based on a global observational dataset

    NASA Astrophysics Data System (ADS)

    Sommer, Philipp; Kaplan, Jed

    2016-04-01

    Accurate modelling of large-scale vegetation dynamics, hydrology, and other environmental processes requires meteorological forcing on daily timescales. While meteorological data with high temporal resolution is becoming increasingly available, simulations for the future or distant past are limited by lack of data and poor performance of climate models, e.g., in simulating daily precipitation. To overcome these limitations, we may temporally downscale monthly summary data to a daily time step using a weather generator. Parameterization of such statistical models has traditionally been based on a limited number of observations. Recent developments in the archiving, distribution, and analysis of "big data" datasets provide new opportunities for the parameterization of a temporal downscaling model that is applicable over a wide range of climates. Here we parameterize a WGEN-type weather generator using more than 50 million individual daily meteorological observations, from over 10'000 stations covering all continents, based on the Global Historical Climatology Network (GHCN) and Synoptic Cloud Reports (EECRA) databases. Using the resulting "universal" parameterization and driven by monthly summaries, we downscale mean temperature (minimum and maximum), cloud cover, and total precipitation, to daily estimates. We apply a hybrid gamma-generalized Pareto distribution to calculate daily precipitation amounts, which overcomes much of the inability of earlier weather generators to simulate high amounts of daily precipitation. Our globally parameterized weather generator has numerous applications, including vegetation and crop modelling for paleoenvironmental studies.

  10. Upon Generating Discrete Expanding Integrable Models of the Toda Lattice Systems and Infinite Conservation Laws

    NASA Astrophysics Data System (ADS)

    Zhang, Yufeng; Zhang, Xiangzhi; Wang, Yan; Liu, Jiangen

    2017-01-01

    With the help of R-matrix approach, we present the Toda lattice systems that have extensive applications in statistical physics and quantum physics. By constructing a new discrete integrable formula by R-matrix, the discrete expanding integrable models of the Toda lattice systems and their Lax pairs are generated, respectively. By following the constructing formula again, we obtain the corresponding (2+1)-dimensional Toda lattice systems and their Lax pairs, as well as their (2+1)-dimensional discrete expanding integrable models. Finally, some conservation laws of a (1+1)-dimensional generalised Toda lattice system and a new (2+1)-dimensional lattice system are generated, respectively.

  11. A comparative study of mixed exponential and Weibull distributions in a stochastic model replicating a tropical rainfall process

    NASA Astrophysics Data System (ADS)

    Abas, Norzaida; Daud, Zalina M.; Yusof, Fadhilah

    2014-11-01

    A stochastic rainfall model is presented for the generation of hourly rainfall data in an urban area in Malaysia. In view of the high temporal and spatial variability of rainfall within the tropical rain belt, the Spatial-Temporal Neyman-Scott Rectangular Pulse model was used. The model, which is governed by the Neyman-Scott process, employs a reasonable number of parameters to represent the physical attributes of rainfall. A common approach is to attach each attribute to a mathematical distribution. With respect to rain cell intensity, this study proposes the use of a mixed exponential distribution. The performance of the proposed model was compared to a model that employs the Weibull distribution. Hourly and daily rainfall data from four stations in the Damansara River basin in Malaysia were used as input to the models, and simulations of hourly series were performed for an independent site within the basin. The performance of the models was assessed based on how closely the statistical characteristics of the simulated series resembled the statistics of the observed series. The findings obtained based on graphical representation revealed that the statistical characteristics of the simulated series for both models compared reasonably well with the observed series. However, a further assessment using the AIC, BIC and RMSE showed that the proposed model yields better results. The results of this study indicate that for tropical climates, the proposed model, using a mixed exponential distribution, is the best choice for generation of synthetic data for ungauged sites or for sites with insufficient data within the limit of the fitted region.

  12. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE PAGES

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.; ...

    2018-01-11

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  13. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  14. Synthetic Training Data Generation for Activity Monitoring and Behavior Analysis

    NASA Astrophysics Data System (ADS)

    Monekosso, Dorothy; Remagnino, Paolo

    This paper describes a data generator that produces synthetic data to simulate observations from an array of environment monitoring sensors. The overall goal of our work is to monitor the well-being of one occupant in a home. Sensors are embedded in a smart home to unobtrusively record environmental parameters. Based on the sensor observations, behavior analysis and modeling are performed. However behavior analysis and modeling require large data sets to be collected over long periods of time to achieve the level of accuracy expected. A data generator - was developed based on initial data i.e. data collected over periods lasting weeks to facilitate concurrent data collection and development of algorithms. The data generator is based on statistical inference techniques. Variation is introduced into the data using perturbation models.

  15. A simple stochastic rainstorm generator for simulating spatially and temporally varying rainfall

    NASA Astrophysics Data System (ADS)

    Singer, M. B.; Michaelides, K.; Nichols, M.; Nearing, M. A.

    2016-12-01

    In semi-arid to arid drainage basins, rainstorms often control both water supply and flood risk to marginal communities of people. They also govern the availability of water to vegetation and other ecological communities, as well as spatial patterns of sediment, nutrient, and contaminant transport and deposition on local to basin scales. All of these landscape responses are sensitive to changes in climate that are projected to occur throughout western North America. Thus, it is important to improve characterization of rainstorms in a manner that enables statistical assessment of rainfall at spatial scales below that of existing gauging networks and the prediction of plausible manifestations of climate change. Here we present a simple, stochastic rainstorm generator that was created using data from a rich and dense network of rain gauges at the Walnut Gulch Experimental Watershed (WGEW) in SE Arizona, but which is applicable anywhere. We describe our methods for assembling pdfs of relevant rainstorm characteristics including total annual rainfall, storm area, storm center location, and storm duration. We also generate five fitted intensity-duration curves and apply a spatial rainfall gradient to generate precipitation at spatial scales below gauge spacing. The model then runs by Monte Carlo simulation in which a total annual rainfall is selected before we generate rainstorms until the annual precipitation total is reached. The procedure continues for decadal simulations. Thus, we keep track of the hydrologic impact of individual storms and the integral of precipitation over multiple decades. We first test the model using ensemble predictions until we reach statistical similarity to the input data from WGEW. We then employ the model to assess decadal precipitation under simulations of climate change in which we separately vary the distribution of total annual rainfall (trend in moisture) and the intensity-duration curves used for simulation (trends in storminess). We demonstrate the model output through spatial maps of rainfall and through statistical comparisons of relevant parameters and distributions. Finally, discuss how the model can be used to understand basin-scale hydrology in terms of soil moisture, runoff, and erosion.

  16. Population activity statistics dissect subthreshold and spiking variability in V1.

    PubMed

    Bányai, Mihály; Koman, Zsombor; Orbán, Gergő

    2017-07-01

    Response variability, as measured by fluctuating responses upon repeated performance of trials, is a major component of neural responses, and its characterization is key to interpret high dimensional population recordings. Response variability and covariability display predictable changes upon changes in stimulus and cognitive or behavioral state, providing an opportunity to test the predictive power of models of neural variability. Still, there is little agreement on which model to use as a building block for population-level analyses, and models of variability are often treated as a subject of choice. We investigate two competing models, the doubly stochastic Poisson (DSP) model assuming stochasticity at spike generation, and the rectified Gaussian (RG) model tracing variability back to membrane potential variance, to analyze stimulus-dependent modulation of both single-neuron and pairwise response statistics. Using a pair of model neurons, we demonstrate that the two models predict similar single-cell statistics. However, DSP and RG models have contradicting predictions on the joint statistics of spiking responses. To test the models against data, we build a population model to simulate stimulus change-related modulations in pairwise response statistics. We use single-unit data from the primary visual cortex (V1) of monkeys to show that while model predictions for variance are qualitatively similar to experimental data, only the RG model's predictions are compatible with joint statistics. These results suggest that models using Poisson-like variability might fail to capture important properties of response statistics. We argue that membrane potential-level modeling of stochasticity provides an efficient strategy to model correlations. NEW & NOTEWORTHY Neural variability and covariability are puzzling aspects of cortical computations. For efficient decoding and prediction, models of information encoding in neural populations hinge on an appropriate model of variability. Our work shows that stimulus-dependent changes in pairwise but not in single-cell statistics can differentiate between two widely used models of neuronal variability. Contrasting model predictions with neuronal data provides hints on the noise sources in spiking and provides constraints on statistical models of population activity. Copyright © 2017 the American Physiological Society.

  17. Integrating satellite imagery with simulation modeling to improve burn severity mapping

    Treesearch

    Eva C. Karau; Pamela G. Sikkink; Robert E. Keane; Gregory K. Dillon

    2014-01-01

    Both satellite imagery and spatial fire effects models are valuable tools for generating burn severity maps that are useful to fire scientists and resource managers. The purpose of this study was to test a new mapping approach that integrates imagery and modeling to create more accurate burn severity maps. We developed and assessed a statistical model that combines the...

  18. Analysis of intra-arch and interarch measurements from digital models with 2 impression materials and a modeling process based on cone-beam computed tomography.

    PubMed

    White, Aaron J; Fallis, Drew W; Vandewalle, Kraig S

    2010-04-01

    Study models are an essential part of an orthodontic record. Digital models are now available. One option for generating a digital model is cone-beam computed tomography (CBCT) scanning of orthodontic impressions and bite registrations. However, the accuracy of digital measurements from models generated by this method has yet to be thoroughly evaluated. A plastic typodont was modified with reference points for standardized intra-arch and interarch measurements, and 16 sets of maxillary and mandibular vinylpolysiloxane and alginate impressions were made. A copper wax-bite registration was made with the typodont in maximum intercuspal position to accompany each set of impressions. The impressions were shipped to OrthoProofUSA (Albuquerque, NM), where digital orthodontic models were generated via CBCT. Intra-arch and interarch measurements were made directly on the typodont with electronic digital calipers and on the digital models by using OrthoProofUSA's proprietary DigiModel software. Percentage differences from the typodont of all intra-arch measurements in the alginate and vinylpolysiloxane groups were low, from 0.1% to 0.7%. Statistical analysis of the intra-arch percentage differences from the typodont of the alginate and vinylpolysiloxane groups had a statistically significant difference between the groups only for maxillary intermolar width. However, because of the small percentage differences, this was not considered clinically significant for orthodontic measurements. Percentage differences from the typodont of all interarch measurements in the alginate and vinylpolysiloxane groups were much higher, from 3.3% to 10.7%. Statistical analysis of the interarch percentage differences from the typodont of the alginate and vinylpolysiloxane groups showed statistically significant differences between the groups in both the maxillary right canine to mandibular right canine (alginate with a lower percentage difference than vinylpolysiloxane) and the maxillary left second molar to mandibular left second molar (alginate with a greater percentage difference than vinylpolysiloxane) segments. This difference, ranging from 0.24 to 0.72 mm, is clinically significant. In this study, digital orthodontic models from CBCT scans of alginate and vinylpolysiloxane impressions provided a dimensionally accurate representation of intra-arch relationships for orthodontic evaluation. However, the use of copper wax-bite registrations in this CBCT-based process did not result in an accurate digital representation of interarch relationships. Copyright (c) 2010 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.

  19. The Statistical Segment Length of DNA: Opportunities for Biomechanical Modeling in Polymer Physics and Next-Generation Genomics.

    PubMed

    Dorfman, Kevin D

    2018-02-01

    The development of bright bisintercalating dyes for deoxyribonucleic acid (DNA) in the 1990s, most notably YOYO-1, revolutionized the field of polymer physics in the ensuing years. These dyes, in conjunction with modern molecular biology techniques, permit the facile observation of polymer dynamics via fluorescence microscopy and thus direct tests of different theories of polymer dynamics. At the same time, they have played a key role in advancing an emerging next-generation method known as genome mapping in nanochannels. The effect of intercalation on the bending energy of DNA as embodied by a change in its statistical segment length (or, alternatively, its persistence length) has been the subject of significant controversy. The precise value of the statistical segment length is critical for the proper interpretation of polymer physics experiments and controls the phenomena underlying the aforementioned genomics technology. In this perspective, we briefly review the model of DNA as a wormlike chain and a trio of methods (light scattering, optical or magnetic tweezers, and atomic force microscopy (AFM)) that have been used to determine the statistical segment length of DNA. We then outline the disagreement in the literature over the role of bisintercalation on the bending energy of DNA, and how a multiscale biomechanical approach could provide an important model for this scientifically and technologically relevant problem.

  20. Trends and fluctuations in the severity of interstate wars

    PubMed Central

    Clauset, Aaron

    2018-01-01

    Since 1945, there have been relatively few large interstate wars, especially compared to the preceding 30 years, which included both World Wars. This pattern, sometimes called the long peace, is highly controversial. Does it represent an enduring trend caused by a genuine change in the underlying conflict-generating processes? Or is it consistent with a highly variable but otherwise stable system of conflict? Using the empirical distributions of interstate war sizes and onset times from 1823 to 2003, we parameterize stationary models of conflict generation that can distinguish trends from statistical fluctuations in the statistics of war. These models indicate that both the long peace and the period of great violence that preceded it are not statistically uncommon patterns in realistic but stationary conflict time series. This fact does not detract from the importance of the long peace or the proposed mechanisms that explain it. However, the models indicate that the postwar pattern of peace would need to endure at least another 100 to 140 years to become a statistically significant trend. This fact places an implicit upper bound on the magnitude of any change in the true likelihood of a large war after the end of the Second World War. The historical patterns of war thus seem to imply that the long peace may be substantially more fragile than proponents believe, despite recent efforts to identify mechanisms that reduce the likelihood of interstate wars. PMID:29507877

  1. Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes

    PubMed Central

    Katahira, Kentaro; Suzuki, Kenta; Okanoya, Kazuo; Okada, Masato

    2011-01-01

    Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors such as human speech and musical performance, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical properties of the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable labeles, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model; GMM), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex behavioral sequences with higher-order dependencies. PMID:21915345

  2. Ratio index variables or ANCOVA? Fisher's cats revisited.

    PubMed

    Tu, Yu-Kang; Law, Graham R; Ellison, George T H; Gilthorpe, Mark S

    2010-01-01

    Over 60 years ago Ronald Fisher demonstrated a number of potential pitfalls with statistical analyses using ratio variables. Nonetheless, these pitfalls are largely overlooked in contemporary clinical and epidemiological research, which routinely uses ratio variables in statistical analyses. This article aims to demonstrate how very different findings can be generated as a result of less than perfect correlations among the data used to generate ratio variables. These imperfect correlations result from measurement error and random biological variation. While the former can often be reduced by improvements in measurement, random biological variation is difficult to estimate and eliminate in observational studies. Moreover, wherever the underlying biological relationships among epidemiological variables are unclear, and hence the choice of statistical model is also unclear, the different findings generated by different analytical strategies can lead to contradictory conclusions. Caution is therefore required when interpreting analyses of ratio variables whenever the underlying biological relationships among the variables involved are unspecified or unclear. (c) 2009 John Wiley & Sons, Ltd.

  3. A probabilistic approach to photovoltaic generator performance prediction

    NASA Astrophysics Data System (ADS)

    Khallat, M. A.; Rahman, S.

    1986-09-01

    A method for predicting the performance of a photovoltaic (PV) generator based on long term climatological data and expected cell performance is described. The equations for cell model formulation are provided. Use of the statistical model for characterizing the insolation level is discussed. The insolation data is fitted to appropriate probability distribution functions (Weibull, beta, normal). The probability distribution functions are utilized to evaluate the capacity factors of PV panels or arrays. An example is presented revealing the applicability of the procedure.

  4. Effective field theory of statistical anisotropies for primordial bispectrum and gravitational waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rostami, Tahereh; Karami, Asieh; Firouzjahi, Hassan, E-mail: t.rostami@ipm.ir, E-mail: karami@ipm.ir, E-mail: firouz@ipm.ir

    2017-06-01

    We present the effective field theory studies of primordial statistical anisotropies in models of anisotropic inflation. The general action in unitary gauge is presented to calculate the leading interactions between the gauge field fluctuations, the curvature perturbations and the tensor perturbations. The anisotropies in scalar power spectrum and bispectrum are calculated and the dependence of these anisotropies to EFT couplings are presented. In addition, we calculate the statistical anisotropy in tensor power spectrum and the scalar-tensor cross correlation. Our EFT approach incorporates anisotropies generated in models with non-trivial speed for the gauge field fluctuations and sound speed for scalar perturbationsmore » such as in DBI inflation.« less

  5. Regression and statistical shape model based substitute CT generation for MRI alone external beam radiation therapy from standard clinical MRI sequences.

    PubMed

    Ghose, Soumya; Greer, Peter B; Sun, Jidi; Pichler, Peter; Rivest-Henault, David; Mitra, Jhimli; Richardson, Haylea; Wratten, Chris; Martin, Jarad; Arm, Jameen; Best, Leah; Dowling, Jason A

    2017-10-27

    In MR only radiation therapy planning, generation of the tissue specific HU map directly from the MRI would eliminate the need of CT image acquisition and may improve radiation therapy planning. The aim of this work is to generate and validate substitute CT (sCT) scans generated from standard T2 weighted MR pelvic scans in prostate radiation therapy dose planning. A Siemens Skyra 3T MRI scanner with laser bridge, flat couch and pelvic coil mounts was used to scan 39 patients scheduled for external beam radiation therapy for localized prostate cancer. For sCT generation a whole pelvis MRI (1.6 mm 3D isotropic T2w SPACE sequence) was acquired. Patients received a routine planning CT scan. Co-registered whole pelvis CT and T2w MRI pairs were used as training images. Advanced tissue specific non-linear regression models to predict HU for the fat, muscle, bladder and air were created from co-registered CT-MRI image pairs. On a test case T2w MRI, the bones and bladder were automatically segmented using a novel statistical shape and appearance model, while other soft tissues were separated using an Expectation-Maximization based clustering model. The CT bone in the training database that was most 'similar' to the segmented bone was then transformed with deformable registration to create the sCT component of the test case T2w MRI bone tissue. Predictions for the bone, air and soft tissue from the separate regression models were successively combined to generate a whole pelvis sCT. The change in monitor units between the sCT-based plans relative to the gold standard CT plan for the same IMRT dose plan was found to be [Formula: see text] (mean  ±  standard deviation) for 39 patients. The 3D Gamma pass rate was [Formula: see text] (2 mm/2%). The novel hybrid model is computationally efficient, generating an sCT in 20 min from standard T2w images for prostate cancer radiation therapy dose planning and DRR generation.

  6. Regression and statistical shape model based substitute CT generation for MRI alone external beam radiation therapy from standard clinical MRI sequences

    NASA Astrophysics Data System (ADS)

    Ghose, Soumya; Greer, Peter B.; Sun, Jidi; Pichler, Peter; Rivest-Henault, David; Mitra, Jhimli; Richardson, Haylea; Wratten, Chris; Martin, Jarad; Arm, Jameen; Best, Leah; Dowling, Jason A.

    2017-11-01

    In MR only radiation therapy planning, generation of the tissue specific HU map directly from the MRI would eliminate the need of CT image acquisition and may improve radiation therapy planning. The aim of this work is to generate and validate substitute CT (sCT) scans generated from standard T2 weighted MR pelvic scans in prostate radiation therapy dose planning. A Siemens Skyra 3T MRI scanner with laser bridge, flat couch and pelvic coil mounts was used to scan 39 patients scheduled for external beam radiation therapy for localized prostate cancer. For sCT generation a whole pelvis MRI (1.6 mm 3D isotropic T2w SPACE sequence) was acquired. Patients received a routine planning CT scan. Co-registered whole pelvis CT and T2w MRI pairs were used as training images. Advanced tissue specific non-linear regression models to predict HU for the fat, muscle, bladder and air were created from co-registered CT-MRI image pairs. On a test case T2w MRI, the bones and bladder were automatically segmented using a novel statistical shape and appearance model, while other soft tissues were separated using an Expectation-Maximization based clustering model. The CT bone in the training database that was most ‘similar’ to the segmented bone was then transformed with deformable registration to create the sCT component of the test case T2w MRI bone tissue. Predictions for the bone, air and soft tissue from the separate regression models were successively combined to generate a whole pelvis sCT. The change in monitor units between the sCT-based plans relative to the gold standard CT plan for the same IMRT dose plan was found to be 0.3%+/-0.9% (mean  ±  standard deviation) for 39 patients. The 3D Gamma pass rate was 99.8+/-0.00 (2 mm/2%). The novel hybrid model is computationally efficient, generating an sCT in 20 min from standard T2w images for prostate cancer radiation therapy dose planning and DRR generation.

  7. Improvement of downscaled rainfall and temperature across generations over the Western Himalayan region of India

    NASA Astrophysics Data System (ADS)

    Das, L.; Dutta, M.; Akhter, J.; Meher, J. K.

    2016-12-01

    It is a challenging task to create station level (local scale) climate change information over the mountainous locations of Western Himalayan Region (WHR) in India because of limited data availability and poor data quality. In the present study, missing values of station data were handled through Multiple Imputation Chained Equation (MICE) technique. Finally 22 numbers of rain gauge and 16 number of temperature station data having continuous record during 1901­2005 and 1969­2009 period respectively were considered as reference stations for developing downscaled rainfall and temperature time series from five commonly available GCMs in the IPCC's different generation assessment reports namely 2nd, 3rd, 4th and 5th hereafter known as SAR, TAR, AR4 and AR5 respectively. Downscaled models were developed using the combined data from the ERA-interim reanalysis and GCMs historical runs (in spite of forcing were not identical in different generation) as predictor and station level rainfall and temperature as predictands. Station level downscaled rainfall and temperature time series were constructed for five GCMs available in each generation. Regional averaged downscaled time series comprising of all stations was prepared for each model and generation and the downscaled results were compared with observed time series. Finally an Overall Model Improvement Index (OMII) was developed using the downscaling results, which was used to investigate the model improvement across generations as well as the improvement of downscaling results obtained from the Empirical Statistical Downscaling (ESD) methods. In case of temperature, models have improved from SAR to AR5 over the study area. In all most all the GCMs TAR is showing worst performance over the WHR by considering the different statistical indices used in this study. In case of precipitation, no model has shown gradual improvement from SAR to AR5 both for interpolated and downscaled values.

  8. Alternating Renewal Process Models for Behavioral Observation: Simulation Methods, Software, and Validity Illustrations

    ERIC Educational Resources Information Center

    Pustejovsky, James E.; Runyon, Christopher

    2014-01-01

    Direct observation recording procedures produce reductive summary measurements of an underlying stream of behavior. Previous methodological studies of these recording procedures have employed simulation methods for generating random behavior streams, many of which amount to special cases of a statistical model known as the alternating renewal…

  9. A statistical analysis of three ensembles of crop model responses to temperature and CO2 concentration

    USDA-ARS?s Scientific Manuscript database

    Ensembles of process-based crop models are now commonly used to simulate crop growth and development for climate scenarios of temperature and/or precipitation changes corresponding to different projections of atmospheric CO2 concentrations. This approach generates large datasets with thousands of de...

  10. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  11. Generation of dense statistical connectomes from sparse morphological data

    PubMed Central

    Egger, Robert; Dercksen, Vincent J.; Udvary, Daniel; Hege, Hans-Christian; Oberlaender, Marcel

    2014-01-01

    Sensory-evoked signal flow, at cellular and network levels, is primarily determined by the synaptic wiring of the underlying neuronal circuitry. Measurements of synaptic innervation, connection probabilities and subcellular organization of synaptic inputs are thus among the most active fields of research in contemporary neuroscience. Methods to measure these quantities range from electrophysiological recordings over reconstructions of dendrite-axon overlap at light-microscopic levels to dense circuit reconstructions of small volumes at electron-microscopic resolution. However, quantitative and complete measurements at subcellular resolution and mesoscopic scales to obtain all local and long-range synaptic in/outputs for any neuron within an entire brain region are beyond present methodological limits. Here, we present a novel concept, implemented within an interactive software environment called NeuroNet, which allows (i) integration of sparsely sampled (sub)cellular morphological data into an accurate anatomical reference frame of the brain region(s) of interest, (ii) up-scaling to generate an average dense model of the neuronal circuitry within the respective brain region(s) and (iii) statistical measurements of synaptic innervation between all neurons within the model. We illustrate our approach by generating a dense average model of the entire rat vibrissal cortex, providing the required anatomical data, and illustrate how to measure synaptic innervation statistically. Comparing our results with data from paired recordings in vitro and in vivo, as well as with reconstructions of synaptic contact sites at light- and electron-microscopic levels, we find that our in silico measurements are in line with previous results. PMID:25426033

  12. Acceleration techniques for dependability simulation. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Barnette, James David

    1995-01-01

    As computer systems increase in complexity, the need to project system performance from the earliest design and development stages increases. We have to employ simulation for detailed dependability studies of large systems. However, as the complexity of the simulation model increases, the time required to obtain statistically significant results also increases. This paper discusses an approach that is application independent and can be readily applied to any process-based simulation model. Topics include background on classical discrete event simulation and techniques for random variate generation and statistics gathering to support simulation.

  13. Data free inference with processed data products

    DOE PAGES

    Chowdhary, K.; Najm, H. N.

    2014-07-12

    Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.

  14. Does rational selection of training and test sets improve the outcome of QSAR modeling?

    PubMed

    Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

    2012-10-22

    Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

  15. Measuring Efficiency of Tunisian Schools in the Presence of Quasi-Fixed Inputs: A Bootstrap Data Envelopment Analysis Approach

    ERIC Educational Resources Information Center

    Essid, Hedi; Ouellette, Pierre; Vigeant, Stephane

    2010-01-01

    The objective of this paper is to measure the efficiency of high schools in Tunisia. We use a statistical data envelopment analysis (DEA)-bootstrap approach with quasi-fixed inputs to estimate the precision of our measure. To do so, we developed a statistical model serving as the foundation of the data generation process (DGP). The DGP is…

  16. Large-scale runoff generation - parsimonious parameterisation using high-resolution topography

    NASA Astrophysics Data System (ADS)

    Gong, L.; Halldin, S.; Xu, C.-Y.

    2011-08-01

    World water resources have primarily been analysed by global-scale hydrological models in the last decades. Runoff generation in many of these models are based on process formulations developed at catchments scales. The division between slow runoff (baseflow) and fast runoff is primarily governed by slope and spatial distribution of effective water storage capacity, both acting at very small scales. Many hydrological models, e.g. VIC, account for the spatial storage variability in terms of statistical distributions; such models are generally proven to perform well. The statistical approaches, however, use the same runoff-generation parameters everywhere in a basin. The TOPMODEL concept, on the other hand, links the effective maximum storage capacity with real-world topography. Recent availability of global high-quality, high-resolution topographic data makes TOPMODEL attractive as a basis for a physically-based runoff-generation algorithm at large scales, even if its assumptions are not valid in flat terrain or for deep groundwater systems. We present a new runoff-generation algorithm for large-scale hydrology based on TOPMODEL concepts intended to overcome these problems. The TRG (topography-derived runoff generation) algorithm relaxes the TOPMODEL equilibrium assumption so baseflow generation is not tied to topography. TRG only uses the topographic index to distribute average storage to each topographic index class. The maximum storage capacity is proportional to the range of topographic index and is scaled by one parameter. The distribution of storage capacity within large-scale grid cells is obtained numerically through topographic analysis. The new topography-derived distribution function is then inserted into a runoff-generation framework similar VIC's. Different basin parts are parameterised by different storage capacities, and different shapes of the storage-distribution curves depend on their topographic characteristics. The TRG algorithm is driven by the HydroSHEDS dataset with a resolution of 3" (around 90 m at the equator). The TRG algorithm was validated against the VIC algorithm in a common model framework in 3 river basins in different climates. The TRG algorithm performed equally well or marginally better than the VIC algorithm with one less parameter to be calibrated. The TRG algorithm also lacked equifinality problems and offered a realistic spatial pattern for runoff generation and evaporation.

  17. Large-scale runoff generation - parsimonious parameterisation using high-resolution topography

    NASA Astrophysics Data System (ADS)

    Gong, L.; Halldin, S.; Xu, C.-Y.

    2010-09-01

    World water resources have primarily been analysed by global-scale hydrological models in the last decades. Runoff generation in many of these models are based on process formulations developed at catchments scales. The division between slow runoff (baseflow) and fast runoff is primarily governed by slope and spatial distribution of effective water storage capacity, both acting a very small scales. Many hydrological models, e.g. VIC, account for the spatial storage variability in terms of statistical distributions; such models are generally proven to perform well. The statistical approaches, however, use the same runoff-generation parameters everywhere in a basin. The TOPMODEL concept, on the other hand, links the effective maximum storage capacity with real-world topography. Recent availability of global high-quality, high-resolution topographic data makes TOPMODEL attractive as a basis for a physically-based runoff-generation algorithm at large scales, even if its assumptions are not valid in flat terrain or for deep groundwater systems. We present a new runoff-generation algorithm for large-scale hydrology based on TOPMODEL concepts intended to overcome these problems. The TRG (topography-derived runoff generation) algorithm relaxes the TOPMODEL equilibrium assumption so baseflow generation is not tied to topography. TGR only uses the topographic index to distribute average storage to each topographic index class. The maximum storage capacity is proportional to the range of topographic index and is scaled by one parameter. The distribution of storage capacity within large-scale grid cells is obtained numerically through topographic analysis. The new topography-derived distribution function is then inserted into a runoff-generation framework similar VIC's. Different basin parts are parameterised by different storage capacities, and different shapes of the storage-distribution curves depend on their topographic characteristics. The TRG algorithm is driven by the HydroSHEDS dataset with a resolution of 3'' (around 90 m at the equator). The TRG algorithm was validated against the VIC algorithm in a common model framework in 3 river basins in different climates. The TRG algorithm performed equally well or marginally better than the VIC algorithm with one less parameter to be calibrated. The TRG algorithm also lacked equifinality problems and offered a realistic spatial pattern for runoff generation and evaporation.

  18. Dynamic modelling of n-of-1 data: powerful and flexible data analytics applied to individualised studies.

    PubMed

    Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin

    2017-09-01

    N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.

  19. Statistical properties of Charney-Hasegawa-Mima zonal flows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Johan, E-mail: anderson.johan@gmail.com; Botha, G. J. J.

    2015-05-15

    A theoretical interpretation of numerically generated probability density functions (PDFs) of intermittent plasma transport events in unforced zonal flows is provided within the Charney-Hasegawa-Mima (CHM) model. The governing equation is solved numerically with various prescribed density gradients that are designed to produce different configurations of parallel and anti-parallel streams. Long-lasting vortices form whose flow is governed by the zonal streams. It is found that the numerically generated PDFs can be matched with analytical predictions of PDFs based on the instanton method by removing the autocorrelations from the time series. In many instances, the statistics generated by the CHM dynamics relaxesmore » to Gaussian distributions for both the electrostatic and vorticity perturbations, whereas in areas with strong nonlinear interactions it is found that the PDFs are exponentially distributed.« less

  20. A Simple Simulation Technique for Nonnormal Data with Prespecified Skewness, Kurtosis, and Covariance Matrix.

    PubMed

    Foldnes, Njål; Olsson, Ulf Henning

    2016-01-01

    We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.

  1. Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation

    NASA Technical Reports Server (NTRS)

    DePriest, Douglas; Morgan, Carolyn

    2003-01-01

    The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.

  2. Mathematical and Statistical Techniques for Systems Medicine: The Wnt Signaling Pathway as a Case Study.

    PubMed

    MacLean, Adam L; Harrington, Heather A; Stumpf, Michael P H; Byrne, Helen M

    2016-01-01

    The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.

  3. A method to generate small-scale, high-resolution sedimentary bedform architecture models representing realistic geologic facies

    DOE PAGES

    Meckel, T. A.; Trevisan, L.; Krishnamurthy, P. G.

    2017-08-23

    Small-scale (mm to m) sedimentary structures (e.g. ripple lamination, cross-bedding) have received a great deal of attention in sedimentary geology. The influence of depositional heterogeneity on subsurface fluid flow is now widely recognized, but incorporating these features in physically-rational bedform models at various scales remains problematic. The current investigation expands the capability of an existing set of open-source codes, allowing generation of high-resolution 3D bedform architecture models. The implemented modifications enable the generation of 3D digital models consisting of laminae and matrix (binary field) with characteristic depositional architecture. The binary model is then populated with petrophysical properties using a texturalmore » approach for additional analysis such as statistical characterization, property upscaling, and single and multiphase fluid flow simulation. One example binary model with corresponding threshold capillary pressure field and the scripts used to generate them are provided, but the approach can be used to generate dozens of previously documented common facies models and a variety of property assignments. An application using the example model is presented simulating buoyant fluid (CO 2) migration and resulting saturation distribution.« less

  4. A method to generate small-scale, high-resolution sedimentary bedform architecture models representing realistic geologic facies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meckel, T. A.; Trevisan, L.; Krishnamurthy, P. G.

    Small-scale (mm to m) sedimentary structures (e.g. ripple lamination, cross-bedding) have received a great deal of attention in sedimentary geology. The influence of depositional heterogeneity on subsurface fluid flow is now widely recognized, but incorporating these features in physically-rational bedform models at various scales remains problematic. The current investigation expands the capability of an existing set of open-source codes, allowing generation of high-resolution 3D bedform architecture models. The implemented modifications enable the generation of 3D digital models consisting of laminae and matrix (binary field) with characteristic depositional architecture. The binary model is then populated with petrophysical properties using a texturalmore » approach for additional analysis such as statistical characterization, property upscaling, and single and multiphase fluid flow simulation. One example binary model with corresponding threshold capillary pressure field and the scripts used to generate them are provided, but the approach can be used to generate dozens of previously documented common facies models and a variety of property assignments. An application using the example model is presented simulating buoyant fluid (CO 2) migration and resulting saturation distribution.« less

  5. Reverse engineering systems models of regulation: discovery, prediction and mechanisms.

    PubMed

    Ashworth, Justin; Wurtmann, Elisabeth J; Baliga, Nitin S

    2012-08-01

    Biological systems can now be understood in comprehensive and quantitative detail using systems biology approaches. Putative genome-scale models can be built rapidly based upon biological inventories and strategic system-wide molecular measurements. Current models combine statistical associations, causative abstractions, and known molecular mechanisms to explain and predict quantitative and complex phenotypes. This top-down 'reverse engineering' approach generates useful organism-scale models despite noise and incompleteness in data and knowledge. Here we review and discuss the reverse engineering of biological systems using top-down data-driven approaches, in order to improve discovery, hypothesis generation, and the inference of biological properties. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. Statistical wave climate projections for coastal impact assessments

    NASA Astrophysics Data System (ADS)

    Camus, P.; Losada, I. J.; Izaguirre, C.; Espejo, A.; Menéndez, M.; Pérez, J.

    2017-09-01

    Global multimodel wave climate projections are obtained at 1.0° × 1.0° scale from 30 Coupled Model Intercomparison Project Phase 5 (CMIP5) global circulation model (GCM) realizations. A semi-supervised weather-typing approach based on a characterization of the ocean wave generation areas and the historical wave information from the recent GOW2 database are used to train the statistical model. This framework is also applied to obtain high resolution projections of coastal wave climate and coastal impacts as port operability and coastal flooding. Regional projections are estimated using the collection of weather types at spacing of 1.0°. This assumption is feasible because the predictor is defined based on the wave generation area and the classification is guided by the local wave climate. The assessment of future changes in coastal impacts is based on direct downscaling of indicators defined by empirical formulations (total water level for coastal flooding and number of hours per year with overtopping for port operability). Global multimodel projections of the significant wave height and peak period are consistent with changes obtained in previous studies. Statistical confidence of expected changes is obtained due to the large number of GCMs to construct the ensemble. The proposed methodology is proved to be flexible to project wave climate at different spatial scales. Regional changes of additional variables as wave direction or other statistics can be estimated from the future empirical distribution with extreme values restricted to high percentiles (i.e., 95th, 99th percentiles). The statistical framework can also be applied to evaluate regional coastal impacts integrating changes in storminess and sea level rise.

  7. Initial Systematic Investigations of the Weakly Coupled Free Fermionic Heterotic String Landscape Statistics

    NASA Astrophysics Data System (ADS)

    Renner, Timothy

    2011-12-01

    A C++ framework was constructed with the explicit purpose of systematically generating string models using the Weakly Coupled Free Fermionic Heterotic String (WCFFHS) method. The software, optimized for speed, generality, and ease of use, has been used to conduct preliminary systematic investigations of WCFFHS vacua. Documentation for this framework is provided in the Appendix. After an introduction to theoretical and computational aspects of WCFFHS model building, a study of ten-dimensional WCFFHS models is presented. Degeneracies among equivalent expressions of each of the known models are investigated and classified. A study of more phenomenologically realistic four-dimensional models based on the well known "NAHE" set is then presented, with statistics being reported on gauge content, matter representations, and space-time supersymmetries. The final study is a parallel to the NAHE study in which a variation of the NAHE set is systematically extended and examined statistically. Special attention is paid to models with "mirroring"---identical observable and hidden sector gauge groups and matter representations.

  8. Model Fit and Item Factor Analysis: Overfactoring, Underfactoring, and a Program to Guide Interpretation.

    PubMed

    Clark, D Angus; Bowles, Ryan P

    2018-04-23

    In exploratory item factor analysis (IFA), researchers may use model fit statistics and commonly invoked fit thresholds to help determine the dimensionality of an assessment. However, these indices and thresholds may mislead as they were developed in a confirmatory framework for models with continuous, not categorical, indicators. The present study used Monte Carlo simulation methods to investigate the ability of popular model fit statistics (chi-square, root mean square error of approximation, the comparative fit index, and the Tucker-Lewis index) and their standard cutoff values to detect the optimal number of latent dimensions underlying sets of dichotomous items. Models were fit to data generated from three-factor population structures that varied in factor loading magnitude, factor intercorrelation magnitude, number of indicators, and whether cross loadings or minor factors were included. The effectiveness of the thresholds varied across fit statistics, and was conditional on many features of the underlying model. Together, results suggest that conventional fit thresholds offer questionable utility in the context of IFA.

  9. Performance Analysis of Garbage Collection and Dynamic Reordering in a Lisp System. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Llames, Rene Lim

    1991-01-01

    Generation based garbage collection and dynamic reordering of objects are two techniques for improving the efficiency of memory management in Lisp and similar dynamic language systems. An analysis of the effect of generation configuration is presented, focusing on the effect of a number of generations and generation capabilities. Analytic timing and survival models are used to represent garbage collection runtime and to derive structural results on its behavior. The survival model provides bounds on the age of objects surviving a garbage collection at a particular level. Empirical results show that execution time is most sensitive to the capacity of the youngest generation. A technique called scanning for transport statistics, for evaluating the effectiveness of reordering independent of main memory size, is presented.

  10. A model for simulating random atmospheres as a function of latitude, season, and time

    NASA Technical Reports Server (NTRS)

    Campbell, J. W.

    1977-01-01

    An empirical stochastic computer model was developed with the capability of generating random thermodynamic profiles of the atmosphere below an altitude of 99 km which are characteristic of any given season, latitude, and time of day. Samples of temperature, density, and pressure profiles generated by the model are statistically similar to measured profiles in a data base of over 6000 rocket and high-altitude atmospheric soundings; that is, means and standard deviations of modeled profiles and their vertical gradients are in close agreement with data. Model-generated samples can be used for Monte Carlo simulations of aircraft or spacecraft trajectories to predict or account for the effects on a vehicle's performance of atmospheric variability. Other potential uses for the model are in simulating pollutant dispersion patterns, variations in sound propagation, and other phenomena which are dependent on atmospheric properties, and in developing data-reduction software for satellite monitoring systems.

  11. Simulating secondary organic aerosol in a regional air quality model using the statistical oxidation model - Part 1: Assessing the influence of constrained multi-generational ageing

    NASA Astrophysics Data System (ADS)

    Jathar, S. H.; Cappa, C. D.; Wexler, A. S.; Seinfeld, J. H.; Kleeman, M. J.

    2015-09-01

    Multi-generational oxidation of volatile organic compound (VOC) oxidation products can significantly alter the mass, chemical composition and properties of secondary organic aerosol (SOA) compared to calculations that consider only the first few generations of oxidation reactions. However, the most commonly used state-of-the-science schemes in 3-D regional or global models that account for multi-generational oxidation (1) consider only functionalization reactions but do not consider fragmentation reactions, (2) have not been constrained to experimental data; and (3) are added on top of existing parameterizations. The incomplete description of multi-generational oxidation in these models has the potential to bias source apportionment and control calculations for SOA. In this work, we used the Statistical Oxidation Model (SOM) of Cappa and Wilson (2012), constrained by experimental laboratory chamber data, to evaluate the regional implications of multi-generational oxidation considering both functionalization and fragmentation reactions. SOM was implemented into the regional UCD/CIT air quality model and applied to air quality episodes in California and the eastern US. The mass, composition and properties of SOA predicted using SOM are compared to SOA predictions generated by a traditional "two-product" model to fully investigate the impact of explicit and self-consistent accounting of multi-generational oxidation. Results show that SOA mass concentrations predicted by the UCD/CIT-SOM model are very similar to those predicted by a two-product model when both models use parameters that are derived from the same chamber data. Since the two-product model does not explicitly resolve multi-generational oxidation reactions, this finding suggests that the chamber data used to parameterize the models captures the majority of the SOA mass formation from multi-generational oxidation under the conditions tested. Consequently, the use of low and high NOx yields perturbs SOA concentrations by a factor of two and are probably a much stronger determinant in 3-D models than constrained multi-generational oxidation. While total predicted SOA mass is similar for the SOM and two-product models, the SOM model predicts increased SOA contributions from anthropogenic (alkane, aromatic) and sesquiterpenes and decreased SOA contributions from isoprene and monoterpene relative to the two-product model calculations. The SOA predicted by SOM has a much lower volatility than that predicted by the traditional model resulting in better qualitative agreement with volatility measurements of ambient OA. On account of its lower-volatility, the SOA mass produced by SOM does not appear to be as strongly influenced by the inclusion of oligomerization reactions, whereas the two-product model relies heavily on oligomerization to form low volatility SOA products. Finally, an unconstrained contemporary hybrid scheme to model multi-generational oxidation within the framework of a two-product model in which "ageing" reactions are added on top of the existing two-product parameterization is considered. This hybrid scheme formed at least three times more SOA than the SOM during regional simulations as a result of excessive transformation of semi-volatile vapors into lower volatility material that strongly partitions to the particle phase. This finding suggests that these "hybrid" multi-generational schemes should be used with great caution in regional models.

  12. Simulating secondary organic aerosol in a regional air quality model using the statistical oxidation model - Part 1: Assessing the influence of constrained multi-generational ageing

    NASA Astrophysics Data System (ADS)

    Jathar, S. H.; Cappa, C. D.; Wexler, A. S.; Seinfeld, J. H.; Kleeman, M. J.

    2016-02-01

    Multi-generational oxidation of volatile organic compound (VOC) oxidation products can significantly alter the mass, chemical composition and properties of secondary organic aerosol (SOA) compared to calculations that consider only the first few generations of oxidation reactions. However, the most commonly used state-of-the-science schemes in 3-D regional or global models that account for multi-generational oxidation (1) consider only functionalization reactions but do not consider fragmentation reactions, (2) have not been constrained to experimental data and (3) are added on top of existing parameterizations. The incomplete description of multi-generational oxidation in these models has the potential to bias source apportionment and control calculations for SOA. In this work, we used the statistical oxidation model (SOM) of Cappa and Wilson (2012), constrained by experimental laboratory chamber data, to evaluate the regional implications of multi-generational oxidation considering both functionalization and fragmentation reactions. SOM was implemented into the regional University of California at Davis / California Institute of Technology (UCD/CIT) air quality model and applied to air quality episodes in California and the eastern USA. The mass, composition and properties of SOA predicted using SOM were compared to SOA predictions generated by a traditional two-product model to fully investigate the impact of explicit and self-consistent accounting of multi-generational oxidation.Results show that SOA mass concentrations predicted by the UCD/CIT-SOM model are very similar to those predicted by a two-product model when both models use parameters that are derived from the same chamber data. Since the two-product model does not explicitly resolve multi-generational oxidation reactions, this finding suggests that the chamber data used to parameterize the models captures the majority of the SOA mass formation from multi-generational oxidation under the conditions tested. Consequently, the use of low and high NOx yields perturbs SOA concentrations by a factor of two and are probably a much stronger determinant in 3-D models than multi-generational oxidation. While total predicted SOA mass is similar for the SOM and two-product models, the SOM model predicts increased SOA contributions from anthropogenic (alkane, aromatic) and sesquiterpenes and decreased SOA contributions from isoprene and monoterpene relative to the two-product model calculations. The SOA predicted by SOM has a much lower volatility than that predicted by the traditional model, resulting in better qualitative agreement with volatility measurements of ambient OA. On account of its lower-volatility, the SOA mass produced by SOM does not appear to be as strongly influenced by the inclusion of oligomerization reactions, whereas the two-product model relies heavily on oligomerization to form low-volatility SOA products. Finally, an unconstrained contemporary hybrid scheme to model multi-generational oxidation within the framework of a two-product model in which ageing reactions are added on top of the existing two-product parameterization is considered. This hybrid scheme formed at least 3 times more SOA than the SOM during regional simulations as a result of excessive transformation of semi-volatile vapors into lower volatility material that strongly partitions to the particle phase. This finding suggests that these hybrid multi-generational schemes should be used with great caution in regional models.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaurov, Alexander A., E-mail: kaurov@uchicago.edu

    The methods for studying the epoch of cosmic reionization vary from full radiative transfer simulations to purely analytical models. While numerical approaches are computationally expensive and are not suitable for generating many mock catalogs, analytical methods are based on assumptions and approximations. We explore the interconnection between both methods. First, we ask how the analytical framework of excursion set formalism can be used for statistical analysis of numerical simulations and visual representation of the morphology of ionization fronts. Second, we explore the methods of training the analytical model on a given numerical simulation. We present a new code which emergedmore » from this study. Its main application is to match the analytical model with a numerical simulation. Then, it allows one to generate mock reionization catalogs with volumes exceeding the original simulation quickly and computationally inexpensively, meanwhile reproducing large-scale statistical properties. These mock catalogs are particularly useful for cosmic microwave background polarization and 21 cm experiments, where large volumes are required to simulate the observed signal.« less

  14. The concept and use of elasticity in population viability models [Exercise 13

    Treesearch

    Carolyn Hull Sieg; Rudy M. King; Fred Van Dyke

    2003-01-01

    As you have seen in exercise 12, plants, such as the western prairie fringed orchid, typically have distinct life stages and complex life cycles that require the matrix analyses associated with a stage-based population model. Some statistics that can be generated from such matrix analyses can be very informative in determining which variables in the model have the...

  15. Modeling uncertainty of evapotranspiration measurements from multiple eddy covariance towers over a crop canopy

    USDA-ARS?s Scientific Manuscript database

    All measurements have random error associated with them. With fluxes in an eddy covariance system, measurement error can been modelled in several ways, often involving a statistical description of turbulence at its core. Using a field experiment with four towers, we generated four replicates of meas...

  16. On statistical analysis of factors affecting anthocyanin extraction from Ixora siamensis

    NASA Astrophysics Data System (ADS)

    Mat Nor, N. A.; Arof, A. K.

    2016-10-01

    This study focused on designing an experimental model in order to evaluate the influence of operative extraction parameters employed for anthocyanin extraction from Ixora siamensis on CIE color measurements (a*, b* and color saturation). Extractions were conducted at temperatures of 30, 55 and 80°C, soaking time of 60, 120 and 180 min using acidified methanol solvent with different trifluoroacetic acid (TFA) contents of 0.5, 1.75 and 3% (v/v). The statistical evaluation was performed by running analysis of variance (ANOVA) and regression calculation to investigate the significance of the generated model. Results show that the generated regression models adequately explain the data variation and significantly represented the actual relationship between the independent variables and the responses. Analysis of variance (ANOVA) showed high coefficient determination values (R2) of 0.9687 for a*, 0.9621 for b* and 0.9758 for color saturation, thus ensuring a satisfactory fit of the developed models with the experimental data. Interaction between TFA content and extraction temperature exhibited to the highest significant influence on CIE color parameter.

  17. Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

    NASA Technical Reports Server (NTRS)

    Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian; hide

    2015-01-01

    Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex process-based crop models is a rather new idea. We demonstrate herewith that statistical methods can play an important role in analyzing simulated yield data sets obtained from the ensembles of process-based crop models. Formal statistical analysis is helpful to estimate the effects of different climatic variables on yield, and to describe the between-model variability of these effects.

  18. Testing for nonlinearity in time series: The method of surrogate data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, J.; Galdrikian, B.; Longtin, A.

    1991-01-01

    We describe a statistical approach for identifying nonlinearity in time series; in particular, we want to avoid claims of chaos when simpler models (such as linearly correlated noise) can explain the data. The method requires a careful statement of the null hypothesis which characterizes a candidate linear process, the generation of an ensemble of surrogate'' data sets which are similar to the original time series but consistent with the null hypothesis, and the computation of a discriminating statistic for the original and for each of the surrogate data sets. The idea is to test the original time series against themore » null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets. We present algorithms for generating surrogate data under various null hypotheses, and we show the results of numerical experiments on artificial data using correlation dimension, Lyapunov exponent, and forecasting error as discriminating statistics. Finally, we consider a number of experimental time series -- including sunspots, electroencephalogram (EEG) signals, and fluid convection -- and evaluate the statistical significance of the evidence for nonlinear structure in each case. 56 refs., 8 figs.« less

  19. Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions

    PubMed Central

    Testolin, Alberto; Zorzi, Marco

    2016-01-01

    Connectionist models can be characterized within the more general framework of probabilistic graphical models, which allow to efficiently describe complex statistical distributions involving a large number of interacting variables. This integration allows building more realistic computational models of cognitive functions, which more faithfully reflect the underlying neural mechanisms at the same time providing a useful bridge to higher-level descriptions in terms of Bayesian computations. Here we discuss a powerful class of graphical models that can be implemented as stochastic, generative neural networks. These models overcome many limitations associated with classic connectionist models, for example by exploiting unsupervised learning in hierarchical architectures (deep networks) and by taking into account top-down, predictive processing supported by feedback loops. We review some recent cognitive models based on generative networks, and we point out promising research directions to investigate neuropsychological disorders within this approach. Though further efforts are required in order to fill the gap between structured Bayesian models and more realistic, biophysical models of neuronal dynamics, we argue that generative neural networks have the potential to bridge these levels of analysis, thereby improving our understanding of the neural bases of cognition and of pathologies caused by brain damage. PMID:27468262

  20. Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions.

    PubMed

    Testolin, Alberto; Zorzi, Marco

    2016-01-01

    Connectionist models can be characterized within the more general framework of probabilistic graphical models, which allow to efficiently describe complex statistical distributions involving a large number of interacting variables. This integration allows building more realistic computational models of cognitive functions, which more faithfully reflect the underlying neural mechanisms at the same time providing a useful bridge to higher-level descriptions in terms of Bayesian computations. Here we discuss a powerful class of graphical models that can be implemented as stochastic, generative neural networks. These models overcome many limitations associated with classic connectionist models, for example by exploiting unsupervised learning in hierarchical architectures (deep networks) and by taking into account top-down, predictive processing supported by feedback loops. We review some recent cognitive models based on generative networks, and we point out promising research directions to investigate neuropsychological disorders within this approach. Though further efforts are required in order to fill the gap between structured Bayesian models and more realistic, biophysical models of neuronal dynamics, we argue that generative neural networks have the potential to bridge these levels of analysis, thereby improving our understanding of the neural bases of cognition and of pathologies caused by brain damage.

  1. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition.

    PubMed

    Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2014-06-05

    In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.

  2. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition

    PubMed Central

    2014-01-01

    Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483

  3. Temperature and composition dependence of short-range order and entropy, and statistics of bond length: the semiconductor alloy (GaN)(1-x)(ZnO)(x).

    PubMed

    Liu, Jian; Pedroza, Luana S; Misch, Carissa; Fernández-Serra, Maria V; Allen, Philip B

    2014-07-09

    We present total energy and force calculations for the (GaN)1-x(ZnO)x alloy. Site-occupancy configurations are generated from Monte Carlo (MC) simulations, on the basis of a cluster expansion model proposed in a previous study. Local atomic coordinate relaxations of surprisingly large magnitude are found via density-functional calculations using a 432-atom periodic supercell, for three representative configurations at x = 0.5. These are used to generate bond-length distributions. The configurationally averaged composition- and temperature-dependent short-range order (SRO) parameters of the alloys are discussed. The entropy is approximated in terms of pair distribution statistics and thus related to SRO parameters. This approximate entropy is compared with accurate numerical values from MC simulations. An empirical model for the dependence of the bond length on the local chemical environments is proposed.

  4. A Climate Statistics Tool and Data Repository

    NASA Astrophysics Data System (ADS)

    Wang, J.; Kotamarthi, V. R.; Kuiper, J. A.; Orr, A.

    2017-12-01

    Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.

  5. Multivariate statistical approach to estimate mixing proportions for unknown end members

    USGS Publications Warehouse

    Valder, Joshua F.; Long, Andrew J.; Davis, Arden D.; Kenner, Scott J.

    2012-01-01

    A multivariate statistical method is presented, which includes principal components analysis (PCA) and an end-member mixing model to estimate unknown end-member hydrochemical compositions and the relative mixing proportions of those end members in mixed waters. PCA, together with the Hotelling T2 statistic and a conceptual model of groundwater flow and mixing, was used in selecting samples that best approximate end members, which then were used as initial values in optimization of the end-member mixing model. This method was tested on controlled datasets (i.e., true values of estimates were known a priori) and found effective in estimating these end members and mixing proportions. The controlled datasets included synthetically generated hydrochemical data, synthetically generated mixing proportions, and laboratory analyses of sample mixtures, which were used in an evaluation of the effectiveness of this method for potential use in actual hydrological settings. For three different scenarios tested, correlation coefficients (R2) for linear regression between the estimated and known values ranged from 0.968 to 0.993 for mixing proportions and from 0.839 to 0.998 for end-member compositions. The method also was applied to field data from a study of end-member mixing in groundwater as a field example and partial method validation.

  6. Stochastic and Statistical Analysis of Utility Revenues and Weather Data Analysis for Consumer Demand Estimation in Smart Grids

    PubMed Central

    Ali, S. M.; Mehmood, C. A; Khan, B.; Jawad, M.; Farid, U; Jadoon, J. K.; Ali, M.; Tareen, N. K.; Usman, S.; Majid, M.; Anwar, S. M.

    2016-01-01

    In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion. PMID:27314229

  7. Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation

    NASA Technical Reports Server (NTRS)

    Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.

    1998-01-01

    The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.

  8. Stochastic and Statistical Analysis of Utility Revenues and Weather Data Analysis for Consumer Demand Estimation in Smart Grids.

    PubMed

    Ali, S M; Mehmood, C A; Khan, B; Jawad, M; Farid, U; Jadoon, J K; Ali, M; Tareen, N K; Usman, S; Majid, M; Anwar, S M

    2016-01-01

    In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion.

  9. Automated speech understanding: the next generation

    NASA Astrophysics Data System (ADS)

    Picone, J.; Ebel, W. J.; Deshmukh, N.

    1995-04-01

    Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.

  10. Stochastic simulation of karst conduit networks

    NASA Astrophysics Data System (ADS)

    Pardo-Igúzquiza, Eulogio; Dowd, Peter A.; Xu, Chaoshui; Durán-Valsero, Juan José

    2012-01-01

    Karst aquifers have very high spatial heterogeneity. Essentially, they comprise a system of pipes (i.e., the network of conduits) superimposed on rock porosity and on a network of stratigraphic surfaces and fractures. This heterogeneity strongly influences the hydraulic behavior of the karst and it must be reproduced in any realistic numerical model of the karst system that is used as input to flow and transport modeling. However, the directly observed karst conduits are only a small part of the complete karst conduit system and knowledge of the complete conduit geometry and topology remains spatially limited and uncertain. Thus, there is a special interest in the stochastic simulation of networks of conduits that can be combined with fracture and rock porosity models to provide a realistic numerical model of the karst system. Furthermore, the simulated model may be of interest per se and other uses could be envisaged. The purpose of this paper is to present an efficient method for conditional and non-conditional stochastic simulation of karst conduit networks. The method comprises two stages: generation of conduit geometry and generation of topology. The approach adopted is a combination of a resampling method for generating conduit geometries from templates and a modified diffusion-limited aggregation method for generating the network topology. The authors show that the 3D karst conduit networks generated by the proposed method are statistically similar to observed karst conduit networks or to a hypothesized network model. The statistical similarity is in the sense of reproducing the tortuosity index of conduits, the fractal dimension of the network, the direction rose of directions, the Z-histogram and Ripley's K-function of the bifurcation points (which differs from a random allocation of those bifurcation points). The proposed method (1) is very flexible, (2) incorporates any experimental data (conditioning information) and (3) can easily be modified when implemented in a hydraulic inverse modeling procedure. Several synthetic examples are given to illustrate the methodology and real conduit network data are used to generate simulated networks that mimic real geometries and topology.

  11. A random spatial network model based on elementary postulates

    USGS Publications Warehouse

    Karlinger, Michael R.; Troutman, Brent M.

    1989-01-01

    A model for generating random spatial networks that is based on elementary postulates comparable to those of the random topology model is proposed. In contrast to the random topology model, this model ascribes a unique spatial specification to generated drainage networks, a distinguishing property of some network growth models. The simplicity of the postulates creates an opportunity for potential analytic investigations of the probabilistic structure of the drainage networks, while the spatial specification enables analyses of spatially dependent network properties. In the random topology model all drainage networks, conditioned on magnitude (number of first-order streams), are equally likely, whereas in this model all spanning trees of a grid, conditioned on area and drainage density, are equally likely. As a result, link lengths in the generated networks are not independent, as usually assumed in the random topology model. For a preliminary model evaluation, scale-dependent network characteristics, such as geometric diameter and link length properties, and topologic characteristics, such as bifurcation ratio, are computed for sets of drainage networks generated on square and rectangular grids. Statistics of the bifurcation and length ratios fall within the range of values reported for natural drainage networks, but geometric diameters tend to be relatively longer than those for natural networks.

  12. A Statistical Bias Correction Tool for Generating Climate Change Scenarios in Indonesia based on CMIP5 Datasets

    NASA Astrophysics Data System (ADS)

    Faqih, A.

    2017-03-01

    Providing information regarding future climate scenarios is very important in climate change study. The climate scenario can be used as basic information to support adaptation and mitigation studies. In order to deliver future climate scenarios over specific region, baseline and projection data from the outputs of global climate models (GCM) is needed. However, due to its coarse resolution, the data have to be downscaled and bias corrected in order to get scenario data with better spatial resolution that match the characteristics of the observed data. Generating this downscaled data is mostly difficult for scientist who do not have specific background, experience and skill in dealing with the complex data from the GCM outputs. In this regards, it is necessary to develop a tool that can be used to simplify the downscaling processes in order to help scientist, especially in Indonesia, for generating future climate scenario data that can be used for their climate change-related studies. In this paper, we introduce a tool called as “Statistical Bias Correction for Climate Scenarios (SiBiaS)”. The tool is specially designed to facilitate the use of CMIP5 GCM data outputs and process their statistical bias corrections relative to the reference data from observations. It is prepared for supporting capacity building in climate modeling in Indonesia as part of the Indonesia 3rd National Communication (TNC) project activities.

  13. On the Effect of Dipole-Dipole Interactions on the Quantum Statistics of Surface Plasmons in Multiparticle Spaser Systems

    NASA Astrophysics Data System (ADS)

    Shesterikov, A. V.; Gubin, M. Yu.; Karpov, S. N.; Prokhorov, A. V.

    2018-04-01

    The problem of controlling the quantum dynamics of localized plasmons has been considered in the model of a four-particle spaser composed of metallic nanoparticles and semiconductor quantum dots. Conditions for the observation of stable steady-state regimes of the formation of surface plasmons in this model have been determined in the mean-field approximation. It has been shown that the presence of strong dipole-dipole interactions between metallic nanoparticles of the spaser system leads to a considerable change in the quantum statistics of plasmons generated on the nanoparticles.

  14. Using a coupled hydro-mechanical fault model to better understand the risk of induced seismicity in deep geothermal projects

    NASA Astrophysics Data System (ADS)

    Abe, Steffen; Krieger, Lars; Deckert, Hagen

    2017-04-01

    The changes of fluid pressures related to the injection of fluids into the deep underground, for example during geothermal energy production, can potentially reactivate faults and thus cause induced seismic events. Therefore, an important aspect in the planning and operation of such projects, in particular in densely populated regions such as the Upper Rhine Graben in Germany, is the estimation and mitigation of the induced seismic risk. The occurrence of induced seismicity depends on a combination of hydraulic properties of the underground, mechanical and geometric parameters of the fault, and the fluid injection regime. In this study we are therefore employing a numerical model to investigate the impact of fluid pressure changes on the dynamics of the faults and the resulting seismicity. The approach combines a model of the fluid flow around a geothermal well based on a 3D finite difference discretisation of the Darcy-equation with a 2D block-slider model of a fault. The models are coupled so that the evolving pore pressure at the relevant locations of the hydraulic model is taken into account in the calculation of the stick-slip dynamics of the fault model. Our modelling approach uses two subsequent modelling steps. Initially, the fault model is run by applying a fixed deformation rate for a given duration and without the influence of the hydraulic model in order to generate the background event statistics. Initial tests have shown that the response of the fault to hydraulic loading depends on the timing of the fluid injection relative to the seismic cycle of the fault. Therefore, multiple snapshots of the fault's stress- and displacement state are generated from the fault model. In a second step, these snapshots are then used as initial conditions in a set of coupled hydro-mechanical model runs including the effects of the fluid injection. This set of models is then compared with the background event statistics to evaluate the change in the probability of seismic events. The event data such as location, magnitude, and source characteristics can be used as input for numerical wave propagation models. This allows the translation of seismic event statistics generated by the model into ground shaking probabilities.

  15. A growing social network model in geographical space

    NASA Astrophysics Data System (ADS)

    Antonioni, Alberto; Tomassini, Marco

    2017-09-01

    In this work we propose a new model for the generation of social networks that includes their often ignored spatial aspects. The model is a growing one and links are created either taking space into account, or disregarding space and only considering the degree of target nodes. These two effects can be mixed linearly in arbitrary proportions through a parameter. We numerically show that for a given range of the combination parameter, and for given mean degree, the generated network class shares many important statistical features with those observed in actual social networks, including the spatial dependence of connections. Moreover, we show that the model provides a good qualitative fit to some measured social networks.

  16. Spatial Pattern Classification for More Accurate Forecasting of Variable Energy Resources

    NASA Astrophysics Data System (ADS)

    Novakovskaia, E.; Hayes, C.; Collier, C.

    2014-12-01

    The accuracy of solar and wind forecasts is becoming increasingly essential as grid operators continue to integrate additional renewable generation onto the electric grid. Forecast errors affect rate payers, grid operators, wind and solar plant maintenance crews and energy traders through increases in prices, project down time or lost revenue. While extensive and beneficial efforts were undertaken in recent years to improve physical weather models for a broad spectrum of applications these improvements have generally not been sufficient to meet the accuracy demands of system planners. For renewables, these models are often used in conjunction with additional statistical models utilizing both meteorological observations and the power generation data. Forecast accuracy can be dependent on specific weather regimes for a given location. To account for these dependencies it is important that parameterizations used in statistical models change as the regime changes. An automated tool, based on an artificial neural network model, has been developed to identify different weather regimes as they impact power output forecast accuracy at wind or solar farms. In this study, improvements in forecast accuracy were analyzed for varying time horizons for wind farms and utility-scale PV plants located in different geographical regions.

  17. The journals are full of great studies but can we believe the statistics? Revisiting the mass privatisation - mortality debate.

    PubMed

    Gerry, Christopher J

    2012-07-01

    Cross-national statistical analyses based on country-level panel data are increasingly popular in social epidemiology. To provide reliable results on the societal determinants of health, analysts must give very careful consideration to conceptual and methodological issues: aggregate (historical) data are typically compatible with multiple alternative stories of the data-generating process. Studies in this field which fail to relate their empirical approach to the true underlying data-generating process are likely to produce misleading results if, for example, they misspecify their models by failing to explore the statistical properties of the longitudinal aspect of their data or by ignoring endogeneity issues. We illustrate the importance of this extra need for care with reference to a recent debate on whether discussing the role of rapid mass privatisation can explain post-communist mortality fluctuations. We demonstrate that the finding that rapid mass privatisation was a "crucial determinant" of male mortality fluctuations in the post-communist world is rejected once better consideration is given to the way in which the data are generated. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. An epidemiological modeling and data integration framework.

    PubMed

    Pfeifer, B; Wurz, M; Hanser, F; Seger, M; Netzer, M; Osl, M; Modre-Osprian, R; Schreier, G; Baumgartner, C

    2010-01-01

    In this work, a cellular automaton software package for simulating different infectious diseases, storing the simulation results in a data warehouse system and analyzing the obtained results to generate prediction models as well as contingency plans, is proposed. The Brisbane H3N2 flu virus, which has been spreading during the winter season 2009, was used for simulation in the federal state of Tyrol, Austria. The simulation-modeling framework consists of an underlying cellular automaton. The cellular automaton model is parameterized by known disease parameters and geographical as well as demographical conditions are included for simulating the spreading. The data generated by simulation are stored in the back room of the data warehouse using the Talend Open Studio software package, and subsequent statistical and data mining tasks are performed using the tool, termed Knowledge Discovery in Database Designer (KD3). The obtained simulation results were used for generating prediction models for all nine federal states of Austria. The proposed framework provides a powerful and easy to handle interface for parameterizing and simulating different infectious diseases in order to generate prediction models and improve contingency plans for future events.

  19. Evidential evaluation of DNA profiles using a discrete statistical model implemented in the DNA LiRa software.

    PubMed

    Puch-Solis, Roberto; Clayton, Tim

    2014-07-01

    The high sensitivity of the technology for producing profiles means that it has become routine to produce profiles from relatively small quantities of DNA. The profiles obtained from low template DNA (LTDNA) are affected by several phenomena which must be taken into consideration when interpreting and evaluating this evidence. Furthermore, many of the same phenomena affect profiles from higher amounts of DNA (e.g. where complex mixtures has been revealed). In this article we present a statistical model, which forms the basis of software DNA LiRa, and that is able to calculate likelihood ratios where one to four donors are postulated and for any number of replicates. The model can take into account dropin and allelic dropout for different contributors, template degradation and uncertain allele designations. In this statistical model unknown parameters are treated following the Empirical Bayesian paradigm. The performance of LiRa is tested using examples and the outputs are compared with those generated using two other statistical software packages likeLTD and LRmix. The concept of ban efficiency is introduced as a measure for assessing model sensitivity. Copyright © 2014. Published by Elsevier Ireland Ltd.

  20. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  1. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  2. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  3. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment

    PubMed Central

    Hashim, Mazlan

    2015-01-01

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning. PMID:25898919

  4. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment.

    PubMed

    Shahabi, Himan; Hashim, Mazlan

    2015-04-22

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.

  5. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  6. Uncertainties of flood frequency estimation approaches based on continuous simulation using data resampling

    NASA Astrophysics Data System (ADS)

    Arnaud, Patrick; Cantet, Philippe; Odry, Jean

    2017-11-01

    Flood frequency analyses (FFAs) are needed for flood risk management. Many methods exist ranging from classical purely statistical approaches to more complex approaches based on process simulation. The results of these methods are associated with uncertainties that are sometimes difficult to estimate due to the complexity of the approaches or the number of parameters, especially for process simulation. This is the case of the simulation-based FFA approach called SHYREG presented in this paper, in which a rainfall generator is coupled with a simple rainfall-runoff model in an attempt to estimate the uncertainties due to the estimation of the seven parameters needed to estimate flood frequencies. The six parameters of the rainfall generator are mean values, so their theoretical distribution is known and can be used to estimate the generator uncertainties. In contrast, the theoretical distribution of the single hydrological model parameter is unknown; consequently, a bootstrap method is applied to estimate the calibration uncertainties. The propagation of uncertainty from the rainfall generator to the hydrological model is also taken into account. This method is applied to 1112 basins throughout France. Uncertainties coming from the SHYREG method and from purely statistical approaches are compared, and the results are discussed according to the length of the recorded observations, basin size and basin location. Uncertainties of the SHYREG method decrease as the basin size increases or as the length of the recorded flow increases. Moreover, the results show that the confidence intervals of the SHYREG method are relatively small despite the complexity of the method and the number of parameters (seven). This is due to the stability of the parameters and takes into account the dependence of uncertainties due to the rainfall model and the hydrological calibration. Indeed, the uncertainties on the flow quantiles are on the same order of magnitude as those associated with the use of a statistical law with two parameters (here generalised extreme value Type I distribution) and clearly lower than those associated with the use of a three-parameter law (here generalised extreme value Type II distribution). For extreme flood quantiles, the uncertainties are mostly due to the rainfall generator because of the progressive saturation of the hydrological model.

  7. 2-Point microstructure archetypes for improved elastic properties

    NASA Astrophysics Data System (ADS)

    Adams, Brent L.; Gao, Xiang

    2004-01-01

    Rectangular models of material microstructure are described by their 1- and 2-point (spatial) correlation statistics of placement of local state. In the procedure described here the local state space is described in discrete form; and the focus is on placement of local state within a finite number of cells comprising rectangular models. It is illustrated that effective elastic properties (generalized Hashin Shtrikman bounds) can be obtained that are linear in components of the correlation statistics. Within this framework the concept of an eigen-microstructure within the microstructure hull is useful. Given the practical innumerability of the microstructure hull, however, we introduce a method for generating a sequence of archetypes of eigen-microstructure, from the 2-point correlation statistics of local state, assuming that the 1-point statistics are stationary. The method is illustrated by obtaining an archetype for an imaginary two-phase material where the objective is to maximize the combination C_{xxxx}^{*} + C_{xyxy}^{*}

  8. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  9. The relationship between the C-statistic of a risk-adjustment model and the accuracy of hospital report cards: a Monte Carlo Study.

    PubMed

    Austin, Peter C; Reeves, Mathew J

    2013-03-01

    Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Monte Carlo simulations were used to examine this issue. We examined the influence of 3 factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card.

  10. The relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards: A Monte Carlo study

    PubMed Central

    Austin, Peter C.; Reeves, Mathew J.

    2015-01-01

    Background Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk-adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. Objectives To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Research Design Monte Carlo simulations were used to examine this issue. We examined the influence of three factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk-adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. Results The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. Conclusions The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card. PMID:23295579

  11. Atmospheric Visibility Monitoring for planetary optical communications

    NASA Technical Reports Server (NTRS)

    Cowles, Kelly

    1991-01-01

    The Atmospheric Visibility Monitoring project endeavors to improve current atmospheric models and generate visibility statistics relevant to prospective earth-satellite optical communications systems. Three autonomous observatories are being used to measure atmospheric conditions on the basis of observed starlight; these data will yield clear-sky and transmission statistics for three sites with high clear-sky probabilities. Ground-based data will be compared with satellite imagery to determine the correlation between satellite data and ground-based observations.

  12. ELF/VLF Waves Generated by an Artificially-Modulated Auroral Electrojet Above the HAARP HF Transmitter

    NASA Astrophysics Data System (ADS)

    Moore, R. C.; Inan, U. S.; Bell, T. F.

    2004-12-01

    Naturally-forming, global-scale currents, such as the polar electrojet current and the mid-latitude dynamo, have been used as current sources to generate electromagnetic waves in the Extremely Low Frequency (ELF) and Very Low Frequency (VLF) bands since the 1970's. While many short-duration experiments have been performed, no continuous multi-week campaign data sets have been published providing reliable statistics for ELF/VLF wave generation. In this paper, we summarize the experimental data resulting from multiple ELF/VLF wave generation campaigns conducted at the High-frequency Active Auroral Research Project (HAARP) HF transmitter in Gakona, Alaska. For one 14-day period in March, 2002, and one 24-day period in November, 2002, the HAARP HF transmitter broadcast ELF/VLF wave generation sequences for 10 hours per day, between 0400 and 1400 UT. Five different modulation frequencies broadcast separately using two HF carrier frequencies are examined at receivers located 36, 44, 147, and 155 km from the HAARP facility. Additionally, a continuous 24-hour transmission period is analyzed to compare day-time wave generation to night-time wave generation. Lastly, a power-ramping scheme was employed to investigate possible thresholding effects at the wave-generating altitude. Wave generation statistics are presented along with source-region property calculations performed using a simple model.

  13. Incorporating GIS building data and census housing statistics for sub-block-level population estimation

    USGS Publications Warehouse

    Wu, S.-S.; Wang, L.; Qiu, X.

    2008-01-01

    This article presents a deterministic model for sub-block-level population estimation based on the total building volumes derived from geographic information system (GIS) building data and three census block-level housing statistics. To assess the model, we generated artificial blocks by aggregating census block areas and calculating the respective housing statistics. We then applied the model to estimate populations for sub-artificial-block areas and assessed the estimates with census populations of the areas. Our analyses indicate that the average percent error of population estimation for sub-artificial-block areas is comparable to those for sub-census-block areas of the same size relative to associated blocks. The smaller the sub-block-level areas, the higher the population estimation errors. For example, the average percent error for residential areas is approximately 0.11 percent for 100 percent block areas and 35 percent for 5 percent block areas.

  14. Live Speech Driven Head-and-Eye Motion Generators.

    PubMed

    Le, Binh H; Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.

  15. Comment on "Social Contagion, Adolescent Sexual Behavior, and Pregnancy: A Nonlinear Dynamic EMOSA Model."

    ERIC Educational Resources Information Center

    Stoolmiller, Mike

    1998-01-01

    Examines the Rodgers, Rowe, and Buster (1998) epidemic model of the onset of social activities for adolescent sexuality. Maintains that its strengths include its theoretical potential to generate new hypotheses for further testing at the individual level. Asserts that its limitations include the lack of a well-developed statistical framework and…

  16. Machine Learning Biogeographic Processes from Biotic Patterns: A New Trait-Dependent Dispersal and Diversification Model with Model Choice By Simulation-Trained Discriminant Analysis.

    PubMed

    Sukumaran, Jeet; Economo, Evan P; Lacey Knowles, L

    2016-05-01

    Current statistical biogeographical analysis methods are limited in the ways ecology can be related to the processes of diversification and geographical range evolution, requiring conflation of geography and ecology, and/or assuming ecologies that are uniform across all lineages and invariant in time. This precludes the possibility of studying a broad class of macroevolutionary biogeographical theories that relate geographical and species histories through lineage-specific ecological and evolutionary dynamics, such as taxon cycle theory. Here we present a new model that generates phylogenies under a complex of superpositioned geographical range evolution, trait evolution, and diversification processes that can communicate with each other. We present a likelihood-free method of inference under our model using discriminant analysis of principal components of summary statistics calculated on phylogenies, with the discriminant functions trained on data generated by simulations under our model. This approach of model selection by classification of empirical data with respect to data generated under training models is shown to be efficient, robust, and performs well over a broad range of parameter space defined by the relative rates of dispersal, trait evolution, and diversification processes. We apply our method to a case study of the taxon cycle, that is testing for habitat and trophic level constraints in the dispersal regimes of the Wallacean avifaunal radiation. ©The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models

    NASA Astrophysics Data System (ADS)

    Zaitchik, B. F.; Berhane, F.; Tadesse, T.

    2015-12-01

    We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS

  18. Generalized functional linear models for gene-based case-control association studies.

    PubMed

    Fan, Ruzong; Wang, Yifan; Mills, James L; Carter, Tonia C; Lobach, Iryna; Wilson, Alexander F; Bailey-Wilson, Joan E; Weeks, Daniel E; Xiong, Momiao

    2014-11-01

    By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. © 2014 WILEY PERIODICALS, INC.

  19. Generalized Functional Linear Models for Gene-based Case-Control Association Studies

    PubMed Central

    Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao

    2014-01-01

    By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683

  20. Construction of estimated flow- and load-duration curves for Kentucky using the Water Availability Tool for Environmental Resources (WATER)

    USGS Publications Warehouse

    Unthank, Michael D.; Newson, Jeremy K.; Williamson, Tanja N.; Nelson, Hugh L.

    2012-01-01

    Flow- and load-duration curves were constructed from the model outputs of the U.S. Geological Survey's Water Availability Tool for Environmental Resources (WATER) application for streams in Kentucky. The WATER application was designed to access multiple geospatial datasets to generate more than 60 years of statistically based streamflow data for Kentucky. The WATER application enables a user to graphically select a site on a stream and generate an estimated hydrograph and flow-duration curve for the watershed upstream of that point. The flow-duration curves are constructed by calculating the exceedance probability of the modeled daily streamflows. User-defined water-quality criteria and (or) sampling results can be loaded into the WATER application to construct load-duration curves that are based on the modeled streamflow results. Estimates of flow and streamflow statistics were derived from TOPographically Based Hydrological MODEL (TOPMODEL) simulations in the WATER application. A modified TOPMODEL code, SDP-TOPMODEL (Sinkhole Drainage Process-TOPMODEL) was used to simulate daily mean discharges over the period of record for 5 karst and 5 non-karst watersheds in Kentucky in order to verify the calibrated model. A statistical evaluation of the model's verification simulations show that calibration criteria, established by previous WATER application reports, were met thus insuring the model's ability to provide acceptably accurate estimates of discharge at gaged and ungaged sites throughout Kentucky. Flow-duration curves are constructed in the WATER application by calculating the exceedence probability of the modeled daily flow values. The flow-duration intervals are expressed as a percentage, with zero corresponding to the highest stream discharge in the streamflow record. Load-duration curves are constructed by applying the loading equation (Load = Flow*Water-quality criterion) at each flow interval.

  1. Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

    PubMed

    Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean

    2010-11-01

    Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

  2. Extracting Hydrologic Understanding from the Unique Space-time Sampling of the Surface Water and Ocean Topography (SWOT) Mission

    NASA Astrophysics Data System (ADS)

    Nickles, C.; Zhao, Y.; Beighley, E.; Durand, M. T.; David, C. H.; Lee, H.

    2017-12-01

    The Surface Water and Ocean Topography (SWOT) satellite mission is jointly developed by NASA, the French space agency (CNES), with participation from the Canadian and UK space agencies to serve both the hydrology and oceanography communities. The SWOT mission will sample global surface water extents and elevations (lakes/reservoirs, rivers, estuaries, oceans, sea and land ice) at a finer spatial resolution than is currently possible enabling hydrologic discovery, model advancements and new applications that are not currently possible or likely even conceivable. Although the mission will provide global cover, analysis and interpolation of the data generated from the irregular space/time sampling represents a significant challenge. In this study, we explore the applicability of the unique space/time sampling for understanding river discharge dynamics throughout the Ohio River Basin. River network topology, SWOT sampling (i.e., orbit and identified SWOT river reaches) and spatial interpolation concepts are used to quantify the fraction of effective sampling of river reaches each day of the three-year mission. Streamflow statistics for SWOT generated river discharge time series are compared to continuous daily river discharge series. Relationships are presented to transform SWOT generated streamflow statistics to equivalent continuous daily discharge time series statistics intended to support hydrologic applications using low-flow and annual flow duration statistics.

  3. Novel pseudo-random number generator based on quantum random walks.

    PubMed

    Yang, Yu-Guang; Zhao, Qian-Qian

    2016-02-04

    In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation.

  4. Novel pseudo-random number generator based on quantum random walks

    PubMed Central

    Yang, Yu-Guang; Zhao, Qian-Qian

    2016-01-01

    In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation. PMID:26842402

  5. Estimating municipal solid waste generation by different activities and various resident groups: a case study of Beijing.

    PubMed

    Li, Zhen-shan; Fu, Hui-zhen; Qu, Xiao-yan

    2011-09-15

    Reliable and accurate determinations of the quantities and composition of wastes is required for the planning of municipal solid waste (MSW) management systems. A model, based on the interrelationships of expenditure on consumer goods, time distribution, daily activities, residents groups, and waste generation, was developed and employed to estimate MSW generation by different activities and resident groups in Beijing. The principle is that MSW is produced by consumption of consumer goods by residents in their daily activities: 'Maintenance' (meeting the basic needs of food, housing and personal care), 'Subsistence' (providing the financial requirements) and 'Leisure' (social and recreational pursuits) activities. Three series of important parameters - waste generation per unit of consumer expenditure, consumer expenditure distribution to activities in unit time, and time assignment to activities by different resident groups - were determined using a statistical analysis, a sampling survey and the Analytic Hierarchy Process, respectively. Data for analysis were obtained from the Beijing Statistical Yearbook (2004-2008) and questionnaire survey. The results reveal that 'Maintenance' activity produced the most MSW, distantly followed by 'Leisure' and 'Subsistence' activities. In 2008, in descending order of MSW generation the different resident groups were floating population, non-civil servants, retired people, civil servants, college students (including both undergraduates and graduates), primary and secondary students, and preschoolers. The new estimation model, which was successful in fitting waste generation by different activities and resident groups over the investigated years, was amenable to MSW prediction. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. Level-crossing statistics of the horizontal wind speed in the planetary surface boundary layer

    NASA Astrophysics Data System (ADS)

    Edwards, Paul J.; Hurst, Robert B.

    2001-09-01

    The probability density of the times for which the horizontal wind remains above or below a given threshold speed is of some interest in the fields of renewable energy generation and pollutant dispersal. However there appear to be no analytic or conceptual models which account for the observed power law form of the distribution of these episode lengths over a range of over three decades, from a few tens of seconds to a day or more. We reanalyze high resolution wind data and demonstrate the fractal character of the point process generated by the wind speed level crossings. We simulate the fluctuating wind speed by a Markov process which approximates the characteristics of the real (non-Markovian) wind and successfully generates a power law distribution of episode lengths. However, fundamental questions concerning the physical basis for this behavior and the connection between the properties of a continuous-time stochastic process and the fractal statistics of the point process generated by its level crossings remain unanswered.

  7. Model Independence in Downscaled Climate Projections: a Case Study in the Southeast United States

    NASA Astrophysics Data System (ADS)

    Gray, G. M. E.; Boyles, R.

    2016-12-01

    Downscaled climate projections are used to deduce how the climate will change in future decades at local and regional scales. It is important to use multiple models to characterize part of the future uncertainty given the impact on adaptation decision making. This is traditionally employed through an equally-weighted ensemble of multiple GCMs downscaled using one technique. Newer practices include several downscaling techniques in an effort to increase the ensemble's representation of future uncertainty. However, this practice may be adding statistically dependent models to the ensemble. Previous research has shown a dependence problem in the GCM ensemble in multiple generations, but has not been shown in the downscaled ensemble. In this case study, seven downscaled climate projections on the daily time scale are considered: CLAREnCE10, SERAP, BCCA (CMIP5 and CMIP3 versions), Hostetler, CCR, and MACA-LIVNEH. These data represent 83 ensemble members, 44 GCMs, and two generations of GCMs. Baseline periods are compared against the University of Idaho's METDATA gridded observation dataset. Hierarchical agglomerative clustering is applied to the correlated errors to determine dependent clusters. Redundant GCMs across different downscaling techniques show the most dependence, while smaller dependence signals are detected within downscaling datasets and across generations of GCMs. These results indicate that using additional downscaled projections to increase the ensemble size must be done with care to avoid redundant GCMs and the process of downscaling may increase the dependence of those downscaled GCMs. Climate model generation does not appear dissimilar enough to be treated as two separate statistical populations for ensemble building at the local and regional scales.

  8. A globally calibrated scheme for generating daily meteorology from monthly statistics: Global-WGEN (GWGEN) v1.0

    NASA Astrophysics Data System (ADS)

    Sommer, Philipp S.; Kaplan, Jed O.

    2017-10-01

    While a wide range of Earth system processes occur at daily and even subdaily timescales, many global vegetation and other terrestrial dynamics models historically used monthly meteorological forcing both to reduce computational demand and because global datasets were lacking. Recently, dynamic land surface modeling has moved towards resolving daily and subdaily processes, and global datasets containing daily and subdaily meteorology have become available. These meteorological datasets, however, cover only the instrumental era of the last approximately 120 years at best, are subject to considerable uncertainty, and represent extremely large data files with associated computational costs of data input/output and file transfer. For periods before the recent past or in the future, global meteorological forcing can be provided by climate model output, but the quality of these data at high temporal resolution is low, particularly for daily precipitation frequency and amount. Here, we present GWGEN, a globally applicable statistical weather generator for the temporal downscaling of monthly climatology to daily meteorology. Our weather generator is parameterized using a global meteorological database and simulates daily values of five common variables: minimum and maximum temperature, precipitation, cloud cover, and wind speed. GWGEN is lightweight, modular, and requires a minimal set of monthly mean variables as input. The weather generator may be used in a range of applications, for example, in global vegetation, crop, soil erosion, or hydrological models. While GWGEN does not currently perform spatially autocorrelated multi-point downscaling of daily weather, this additional functionality could be implemented in future versions.

  9. Computational Motion Phantoms and Statistical Models of Respiratory Motion

    NASA Astrophysics Data System (ADS)

    Ehrhardt, Jan; Klinder, Tobias; Lorenz, Cristian

    Breathing motion is not a robust and 100 % reproducible process, and inter- and intra-fractional motion variations form an important problem in radiotherapy of the thorax and upper abdomen. A widespread consensus nowadays exists that it would be useful to use prior knowledge about respiratory organ motion and its variability to improve radiotherapy planning and treatment delivery. This chapter discusses two different approaches to model the variability of respiratory motion. In the first part, we review computational motion phantoms, i.e. computerized anatomical and physiological models. Computational phantoms are excellent tools to simulate and investigate the effects of organ motion in radiation therapy and to gain insight into methods for motion management. The second part of this chapter discusses statistical modeling techniques to describe the breathing motion and its variability in a population of 4D images. Population-based models can be generated from repeatedly acquired 4D images of the same patient (intra-patient models) and from 4D images of different patients (inter-patient models). The generation of those models is explained and possible applications of those models for motion prediction in radiotherapy are exemplified. Computational models of respiratory motion and motion variability have numerous applications in radiation therapy, e.g. to understand motion effects in simulation studies, to develop and evaluate treatment strategies or to introduce prior knowledge into the patient-specific treatment planning.

  10. Visual and Statistical Analysis of Digital Elevation Models Generated Using Idw Interpolator with Varying Powers

    NASA Astrophysics Data System (ADS)

    Asal, F. F.

    2012-07-01

    Digital elevation data obtained from different Engineering Surveying techniques is utilized in generating Digital Elevation Model (DEM), which is employed in many Engineering and Environmental applications. This data is usually in discrete point format making it necessary to utilize an interpolation approach for the creation of DEM. Quality assessment of the DEM is a vital issue controlling its use in different applications; however this assessment relies heavily on statistical methods with neglecting the visual methods. The research applies visual analysis investigation on DEMs generated using IDW interpolator of varying powers in order to examine their potential in the assessment of the effects of the variation of the IDW power on the quality of the DEMs. Real elevation data has been collected from field using total station instrument in a corrugated terrain. DEMs have been generated from the data at a unified cell size using IDW interpolator with power values ranging from one to ten. Visual analysis has been undertaken using 2D and 3D views of the DEM; in addition, statistical analysis has been performed for assessment of the validity of the visual techniques in doing such analysis. Visual analysis has shown that smoothing of the DEM decreases with the increase in the power value till the power of four; however, increasing the power more than four does not leave noticeable changes on 2D and 3D views of the DEM. The statistical analysis has supported these results where the value of the Standard Deviation (SD) of the DEM has increased with increasing the power. More specifically, changing the power from one to two has produced 36% of the total increase (the increase in SD due to changing the power from one to ten) in SD and changing to the powers of three and four has given 60% and 75% respectively. This refers to decrease in DEM smoothing with the increase in the power of the IDW. The study also has shown that applying visual methods supported by statistical analysis has proven good potential in the DEM quality assessment.

  11. Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2014-02-01

    Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.

  12. Walking through the statistical black boxes of plant breeding.

    PubMed

    Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

    2016-10-01

    The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

  13. Motifs in triadic random graphs based on Steiner triple systems

    NASA Astrophysics Data System (ADS)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  14. Habitat classification modeling with incomplete data: Pushing the habitat envelope

    USGS Publications Warehouse

    Zarnetske, P.L.; Edwards, T.C.; Moisen, Gretchen G.

    2007-01-01

    Habitat classification models (HCMs) are invaluable tools for species conservation, land-use planning, reserve design, and metapopulation assessments, particularly at broad spatial scales. However, species occurrence data are often lacking and typically limited to presence points at broad scales. This lack of absence data precludes the use of many statistical techniques for HCMs. One option is to generate pseudo-absence points so that the many available statistical modeling tools can be used. Traditional techniques generate pseudoabsence points at random across broadly defined species ranges, often failing to include biological knowledge concerning the species-habitat relationship. We incorporated biological knowledge of the species-habitat relationship into pseudo-absence points by creating habitat envelopes that constrain the region from which points were randomly selected. We define a habitat envelope as an ecological representation of a species, or species feature's (e.g., nest) observed distribution (i.e., realized niche) based on a single attribute, or the spatial intersection of multiple attributes. We created HCMs for Northern Goshawk (Accipiter gentilis atricapillus) nest habitat during the breeding season across Utah forests with extant nest presence points and ecologically based pseudo-absence points using logistic regression. Predictor variables were derived from 30-m USDA Landfire and 250-m Forest Inventory and Analysis (FIA) map products. These habitat-envelope-based models were then compared to null envelope models which use traditional practices for generating pseudo-absences. Models were assessed for fit and predictive capability using metrics such as kappa, thresholdindependent receiver operating characteristic (ROC) plots, adjusted deviance (Dadj2), and cross-validation, and were also assessed for ecological relevance. For all cases, habitat envelope-based models outperformed null envelope models and were more ecologically relevant, suggesting that incorporating biological knowledge into pseudo-absence point generation is a powerful tool for species habitat assessments. Furthermore, given some a priori knowledge of the species-habitat relationship, ecologically based pseudo-absence points can be applied to any species, ecosystem, data resolution, and spatial extent. ?? 2007 by the Ecological Society of America.

  15. Maximum entropy models as a tool for building precise neural controls.

    PubMed

    Savin, Cristina; Tkačik, Gašper

    2017-10-01

    Neural responses are highly structured, with population activity restricted to a small subset of the astronomical range of possible activity patterns. Characterizing these statistical regularities is important for understanding circuit computation, but challenging in practice. Here we review recent approaches based on the maximum entropy principle used for quantifying collective behavior in neural activity. We highlight recent models that capture population-level statistics of neural data, yielding insights into the organization of the neural code and its biological substrate. Furthermore, the MaxEnt framework provides a general recipe for constructing surrogate ensembles that preserve aspects of the data, but are otherwise maximally unstructured. This idea can be used to generate a hierarchy of controls against which rigorous statistical tests are possible. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Parameter optimization in biased decoy-state quantum key distribution with both source errors and statistical fluctuations

    NASA Astrophysics Data System (ADS)

    Zhu, Jian-Rong; Li, Jian; Zhang, Chun-Mei; Wang, Qin

    2017-10-01

    The decoy-state method has been widely used in commercial quantum key distribution (QKD) systems. In view of the practical decoy-state QKD with both source errors and statistical fluctuations, we propose a universal model of full parameter optimization in biased decoy-state QKD with phase-randomized sources. Besides, we adopt this model to carry out simulations of two widely used sources: weak coherent source (WCS) and heralded single-photon source (HSPS). Results show that full parameter optimization can significantly improve not only the secure transmission distance but also the final key generation rate. And when taking source errors and statistical fluctuations into account, the performance of decoy-state QKD using HSPS suffered less than that of decoy-state QKD using WCS.

  17. Statistical model with two order parameters for ductile and soft fiber bundles in nanoscience and biomaterials.

    PubMed

    Rinaldi, Antonio

    2011-04-01

    Traditional fiber bundles models (FBMs) have been an effective tool to understand brittle heterogeneous systems. However, fiber bundles in modern nano- and bioapplications demand a new generation of FBM capturing more complex deformation processes in addition to damage. In the context of loose bundle systems and with reference to time-independent plasticity and soft biomaterials, we formulate a generalized statistical model for ductile fracture and nonlinear elastic problems capable of handling more simultaneous deformation mechanisms by means of two order parameters (as opposed to one). As the first rational FBM for coupled damage problems, it may be the cornerstone for advanced statistical models of heterogeneous systems in nanoscience and materials design, especially to explore hierarchical and bio-inspired concepts in the arena of nanobiotechnology. Applicative examples are provided for illustrative purposes at last, discussing issues in inverse analysis (i.e., nonlinear elastic polymer fiber and ductile Cu submicron bars arrays) and direct design (i.e., strength prediction).

  18. Influence of environmental statistics on inhibition of saccadic return

    PubMed Central

    Farrell, Simon; Ludwig, Casimir J. H.; Ellis, Lucy A.; Gilchrist, Iain D.

    2009-01-01

    Initiating an eye movement is slowed if the saccade is directed to a location that has been fixated in the recent past. We show that this inhibitory effect is modulated by the temporal statistics of the environment: If a return location is likely to become behaviorally relevant, inhibition of return is absent. By fitting an accumulator model of saccadic decision-making, we show that the inhibitory effect and the sensitivity to local statistics can be dissociated in their effects on the rate of accumulation of evidence, and the threshold controlling the amount of evidence needed to generate a saccade. PMID:20080778

  19. Appplication of statistical mechanical methods to the modeling of social networks

    NASA Astrophysics Data System (ADS)

    Strathman, Anthony Robert

    With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.

  20. A multiplicative process for generating a beta-like survival function with application to the UK 2016 EU referendum results

    NASA Astrophysics Data System (ADS)

    Fenner, Trevor; Kaufmann, Eric; Levene, Mark; Loizou, George

    Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Contextual and selection effects tend to produce extreme values in the tails of rank-ordered distributions of both census data and district-level election outcomes. Models that account for this nonlinearity generally outperform linear models. Fitting nonlinear functions based on rank-ordering census and election data therefore improves the fit of aggregate voting models. This may help improve ecological inference, as well as election forecasting in majoritarian systems. We propose a generative multiplicative decrease model that gives rise to a rank-order distribution and facilitates the analysis of the recent UK EU referendum results. We supply empirical evidence that the beta-like survival function, which can be generated directly from our model, is a close fit to the referendum results, and also may have predictive value when covariate data are available.

  1. A Knowledge Generation Model via the Hypernetwork

    PubMed Central

    Liu, Jian-Guo; Yang, Guang-Yong; Hu, Zhao-Long

    2014-01-01

    The influence of the statistical properties of the network on the knowledge diffusion has been extensively studied. However, the structure evolution and the knowledge generation processes are always integrated simultaneously. By introducing the Cobb-Douglas production function and treating the knowledge growth as a cooperative production of knowledge, in this paper, we present two knowledge-generation dynamic evolving models based on different evolving mechanisms. The first model, named “HDPH model,” adopts the hyperedge growth and the hyperdegree preferential attachment mechanisms. The second model, named “KSPH model,” adopts the hyperedge growth and the knowledge stock preferential attachment mechanisms. We investigate the effect of the parameters on the total knowledge stock of the two models. The hyperdegree distribution of the HDPH model can be theoretically analyzed by the mean-field theory. The analytic result indicates that the hyperdegree distribution of the HDPH model obeys the power-law distribution and the exponent is . Furthermore, we present the distributions of the knowledge stock for different parameters . The findings indicate that our proposed models could be helpful for deeply understanding the scientific research cooperation. PMID:24626143

  2. A knowledge generation model via the hypernetwork.

    PubMed

    Liu, Jian-Guo; Yang, Guang-Yong; Hu, Zhao-Long

    2014-01-01

    The influence of the statistical properties of the network on the knowledge diffusion has been extensively studied. However, the structure evolution and the knowledge generation processes are always integrated simultaneously. By introducing the Cobb-Douglas production function and treating the knowledge growth as a cooperative production of knowledge, in this paper, we present two knowledge-generation dynamic evolving models based on different evolving mechanisms. The first model, named "HDPH model," adopts the hyperedge growth and the hyperdegree preferential attachment mechanisms. The second model, named "KSPH model," adopts the hyperedge growth and the knowledge stock preferential attachment mechanisms. We investigate the effect of the parameters (α,β) on the total knowledge stock of the two models. The hyperdegree distribution of the HDPH model can be theoretically analyzed by the mean-field theory. The analytic result indicates that the hyperdegree distribution of the HDPH model obeys the power-law distribution and the exponent is γ = 2 + 1/m. Furthermore, we present the distributions of the knowledge stock for different parameters (α,β). The findings indicate that our proposed models could be helpful for deeply understanding the scientific research cooperation.

  3. An experimental study of factors affecting the selective inhibition of sintering process

    NASA Astrophysics Data System (ADS)

    Asiabanpour, Bahram

    Selective Inhibition of Sintering (SIS) is a new rapid prototyping method that builds parts in a layer-by-layer fabrication basis. SIS works by joining powder particles through sintering in the part's body, and by sintering inhibition of some selected powder areas. The objective of this research has been to improve the new SIS process, which has been invented at USC. The process improvement is based on statistical design of experiments. To conduct the needed experiments a working machine and related path generator software were needed. The machine and its control software were made available prior to this research. The path generator algorithms and software had to be created. This program should obtain model geometry data from a CAD file and generate an appropriate path file for the printer nozzle. Also, the program should generate a simulation file for path file inspection using virtual prototyping. The activities related to path generator constitute the first part of this research, which has resulted in an efficient path generator. In addition, to reach an acceptable level of accuracy, strength, and surface quality in the fabricated parts, all effective factors in the SIS process should be identified and controlled. Simultaneous analytical and experimental studies were conducted to recognize effective factors and to control the SIS process. Also, it was known that polystyrene was the most appropriate polymer powder and saturated potassium iodide was the most effective inhibitor among the available candidate materials. In addition, statistical tools were applied to improve the desirable properties of the parts fabricated by the SIS process. An investigation of part strength was conducted using the Response Surface Methodology (RSM) and a region of acceptable operating conditions for the part strength was found. Then, through analysis of the experimental results, the impact of the factors on the final part surface quality and dimensional accuracy was modeled. After developing a desirability function model, process operating conditions for maximum desirability were identified. Finally, the desirability model was validated.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nose, Y.

    Methods were developed for generating an integrated, statistical model of the anatomical structures within the human thorax relevant to radioisotope powered artificial heart implantation. These methods involve measurement and analysis of anatomy in four areas: chest wall, pericardium, vascular connections, and great vessels. A model for the prediction of thorax outline from radiograms was finalized. These models were combined with 100 radiograms to arrive at a size distribution representing the adult male and female populations. (CH)

  5. Finding the Root Causes of Statistical Inconsistency in Community Earth System Model Output

    NASA Astrophysics Data System (ADS)

    Milroy, D.; Hammerling, D.; Baker, A. H.

    2017-12-01

    Baker et al (2015) developed the Community Earth System Model Ensemble Consistency Test (CESM-ECT) to provide a metric for software quality assurance by determining statistical consistency between an ensemble of CESM outputs and new test runs. The test has proved useful for detecting statistical difference caused by compiler bugs and errors in physical modules. However, detection is only the necessary first step in finding the causes of statistical difference. The CESM is a vastly complex model comprised of millions of lines of code which is developed and maintained by a large community of software engineers and scientists. Any root cause analysis is correspondingly challenging. We propose a new capability for CESM-ECT: identifying the sections of code that cause statistical distinguishability. The first step is to discover CESM variables that cause CESM-ECT to classify new runs as statistically distinct, which we achieve via Randomized Logistic Regression. Next we use a tool developed to identify CESM components that define or compute the variables found in the first step. Finally, we employ the application Kernel GENerator (KGEN) created in Kim et al (2016) to detect fine-grained floating point differences. We demonstrate an example of the procedure and advance a plan to automate this process in our future work.

  6. Evaluation of the ecological relevance of mysid toxicity tests using population modeling techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuhn-Hines, A.; Munns, W.R. Jr.; Lussier, S.

    1995-12-31

    A number of acute and chronic bioassay statistics are used to evaluate the toxicity and risks of chemical stressors to the mysid shrimp, Mysidopsis bahia. These include LC{sub 50}S from acute tests, NOECs from 7-day and life-cycle tests, and the US EPA Water Quality Criteria Criterion Continuous Concentrations (CCC). Because these statistics are generated from endpoints which focus upon the responses of individual organisms, their relationships to significant effects at higher levels of ecological organization are unknown. This study was conducted to evaluate the quantitative relationships between toxicity test statistics and a concentration-based statistic derived from exposure-response models describing populationmore » growth rate ({lambda}) to stressor concentration. This statistic, C{sup {sm_bullet}} (concentration where {lambda} = I, zero population growth) describes the concentration above which mysid populations are projected to decline in abundance as determined using population modeling techniques. An analysis of M. bahia responses to 9 metals and 9 organic contaminants indicated the NOEC from life-cycle tests to be the best predictor of C{sup {sm_bullet}}, although the acute LC{sub 50} predicted population-level response surprisingly well. These analyses provide useful information regarding uncertainties of extrapolation among test statistics in assessments of ecological risk.« less

  7. Multiplicative point process as a model of trading activity

    NASA Astrophysics Data System (ADS)

    Gontis, V.; Kaulakys, B.

    2004-11-01

    Signals consisting of a sequence of pulses show that inherent origin of the 1/ f noise is a Brownian fluctuation of the average interevent time between subsequent pulses of the pulse sequence. In this paper, we generalize the model of interevent time to reproduce a variety of self-affine time series exhibiting power spectral density S( f) scaling as a power of the frequency f. Furthermore, we analyze the relation between the power-law correlations and the origin of the power-law probability distribution of the signal intensity. We introduce a stochastic multiplicative model for the time intervals between point events and analyze the statistical properties of the signal analytically and numerically. Such model system exhibits power-law spectral density S( f)∼1/ fβ for various values of β, including β= {1}/{2}, 1 and {3}/{2}. Explicit expressions for the power spectra in the low-frequency limit and for the distribution density of the interevent time are obtained. The counting statistics of the events is analyzed analytically and numerically, as well. The specific interest of our analysis is related with the financial markets, where long-range correlations of price fluctuations largely depend on the number of transactions. We analyze the spectral density and counting statistics of the number of transactions. The model reproduces spectral properties of the real markets and explains the mechanism of power-law distribution of trading activity. The study provides evidence that the statistical properties of the financial markets are enclosed in the statistics of the time interval between trades. A multiplicative point process serves as a consistent model generating this statistics.

  8. Causal Models and Exploratory Analysis in Heterogeneous Information Fusion for Detecting Potential Terrorists

    DTIC Science & Technology

    2015-11-01

    issues and made some of the same distinctions (Walker, Lempert, and Kwakkel ; Bankes, Lempert, and Popper, 2005), but it did appear that we had more than...Statistics,” British Journal of Mathematical Statistics and Pscyhology, 66, pp. 8-38. Jaynes, Edwin T., and G . Larry Bretthorst (ed.) (2003) , Probability...Giroux. Lempert, Robert J., David G . Groves, Steven W. Popper, and Steven C. Bankes (2006), “A General Analytic Method for Generating Robust

  9. A new approach to fracture modelling in reservoirs using deterministic, genetic and statistical models of fracture growth

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rawnsley, K.; Swaby, P.

    1996-08-01

    It is increasingly acknowledged that in order to understand and forecast the behavior of fracture influenced reservoirs we must attempt to reproduce the fracture system geometry and use this as a basis for fluid flow calculation. This article aims to present a recently developed fracture modelling prototype designed specifically for use in hydrocarbon reservoir environments. The prototype {open_quotes}FRAME{close_quotes} (FRActure Modelling Environment) aims to provide a tool which will allow the generation of realistic 3D fracture systems within a reservoir model, constrained to the known geology of the reservoir by both mechanical and statistical considerations, and which can be used asmore » a basis for fluid flow calculation. Two newly developed modelling techniques are used. The first is an interactive tool which allows complex fault surfaces and their associated deformations to be reproduced. The second is a {open_quotes}genetic{close_quotes} model which grows fracture patterns from seeds using conceptual models of fracture development. The user defines the mechanical input and can retrieve all the statistics of the growing fractures to allow comparison to assumed statistical distributions for the reservoir fractures. Input parameters include growth rate, fracture interaction characteristics, orientation maps and density maps. More traditional statistical stochastic fracture models are also incorporated. FRAME is designed to allow the geologist to input hard or soft data including seismically defined surfaces, well fractures, outcrop models, analogue or numerical mechanical models or geological {open_quotes}feeling{close_quotes}. The geologist is not restricted to {open_quotes}a priori{close_quotes} models of fracture patterns that may not correspond to the data.« less

  10. Assimilating Flow Data into Complex Multiple-Point Statistical Facies Models Using Pilot Points Method

    NASA Astrophysics Data System (ADS)

    Ma, W.; Jafarpour, B.

    2017-12-01

    We develop a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information:: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) and its multiple data assimilation variant (ES-MDA) are adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at select locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  11. Forward modeling of gravity data using geostatistically generated subsurface density variations

    USGS Publications Warehouse

    Phelps, Geoffrey

    2016-01-01

    Using geostatistical models of density variations in the subsurface, constrained by geologic data, forward models of gravity anomalies can be generated by discretizing the subsurface and calculating the cumulative effect of each cell (pixel). The results of such stochastically generated forward gravity anomalies can be compared with the observed gravity anomalies to find density models that match the observed data. These models have an advantage over forward gravity anomalies generated using polygonal bodies of homogeneous density because generating numerous realizations explores a larger region of the solution space. The stochastic modeling can be thought of as dividing the forward model into two components: that due to the shape of each geologic unit and that due to the heterogeneous distribution of density within each geologic unit. The modeling demonstrates that the internally heterogeneous distribution of density within each geologic unit can contribute significantly to the resulting calculated forward gravity anomaly. Furthermore, the stochastic models match observed statistical properties of geologic units, the solution space is more broadly explored by producing a suite of successful models, and the likelihood of a particular conceptual geologic model can be compared. The Vaca Fault near Travis Air Force Base, California, can be successfully modeled as a normal or strike-slip fault, with the normal fault model being slightly more probable. It can also be modeled as a reverse fault, although this structural geologic configuration is highly unlikely given the realizations we explored.

  12. Wind Power Curve Modeling Using Statistical Models: An Investigation of Atmospheric Input Variables at a Flat and Complex Terrain Wind Farm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wharton, S.; Bulaevskaya, V.; Irons, Z.

    The goal of our FY15 project was to explore the use of statistical models and high-resolution atmospheric input data to develop more accurate prediction models for turbine power generation. We modeled power for two operational wind farms in two regions of the country. The first site is a 235 MW wind farm in Northern Oklahoma with 140 GE 1.68 turbines. Our second site is a 38 MW wind farm in the Altamont Pass Region of Northern California with 38 Mitsubishi 1 MW turbines. The farms are very different in topography, climatology, and turbine technology; however, both occupy high wind resourcemore » areas in the U.S. and are representative of typical wind farms found in their respective areas.« less

  13. E-Area LLWF Vadose Zone Model: Probabilistic Model for Estimating Subsided-Area Infiltration Rates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dyer, J.; Flach, G.

    A probabilistic model employing a Monte Carlo sampling technique was developed in Python to generate statistical distributions of the upslope-intact-area to subsided-area ratio (Area UAi/Area SAi) for closure cap subsidence scenarios that differ in assumed percent subsidence and the total number of intact plus subsided compartments. The plan is to use this model as a component in the probabilistic system model for the E-Area Performance Assessment (PA), contributing uncertainty in infiltration estimates.

  14. Spacecraft software training needs assessment research, appendices

    NASA Technical Reports Server (NTRS)

    Ratcliff, Shirley; Golas, Katharine

    1990-01-01

    The appendices to the previously reported study are presented: statistical data from task rating worksheets; SSD references; survey forms; fourth generation language, a powerful, long-term solution to maintenance cost; task list; methodology; SwRI's instructional systems development model; relevant research; and references.

  15. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Observing system Research and Predictability EXperiment (THORPEX) Targeted Obs Targeted Observations Cyclone University Research Court College Park, MD 20740 Page Author: EMC Webmaster Page generated:Sunday, 27-May

  16. On-Orbit System Identification

    NASA Technical Reports Server (NTRS)

    Mettler, E.; Milman, M. H.; Bayard, D.; Eldred, D. B.

    1987-01-01

    Information derived from accelerometer readings benefits important engineering and control functions. Report discusses methodology for detection, identification, and analysis of motions within space station. Techniques of vibration and rotation analyses, control theory, statistics, filter theory, and transform methods integrated to form system for generating models and model parameters that characterize total motion of complicated space station, with respect to both control-induced and random mechanical disturbances.

  17. A probabilistic drought forecasting framework: A combined dynamical and statistical approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Hongxiang; Moradkhani, Hamid; Zarekarizi, Mahkameh

    In order to improve drought forecasting skill, this study develops a probabilistic drought forecasting framework comprised of dynamical and statistical modeling components. The novelty of this study is to seek the use of data assimilation to quantify initial condition uncertainty with the Monte Carlo ensemble members, rather than relying entirely on the hydrologic model or land surface model to generate a single deterministic initial condition, as currently implemented in the operational drought forecasting systems. Next, the initial condition uncertainty is quantified through data assimilation and coupled with a newly developed probabilistic drought forecasting model using a copula function. The initialmore » condition at each forecast start date are sampled from the data assimilation ensembles for forecast initialization. Finally, seasonal drought forecasting products are generated with the updated initial conditions. This study introduces the theory behind the proposed drought forecasting system, with an application in Columbia River Basin, Pacific Northwest, United States. Results from both synthetic and real case studies suggest that the proposed drought forecasting system significantly improves the seasonal drought forecasting skills and can facilitate the state drought preparation and declaration, at least three months before the official state drought declaration.« less

  18. Fluvial reservoir characterization using topological descriptors based on spectral analysis of graphs

    NASA Astrophysics Data System (ADS)

    Viseur, Sophie; Chiaberge, Christophe; Rhomer, Jérémy; Audigane, Pascal

    2015-04-01

    Fluvial systems generate highly heterogeneous reservoir. These heterogeneities have major impact on fluid flow behaviors. However, the modelling of such reservoirs is mainly performed in under-constrained contexts as they include complex features, though only sparse and indirect data are available. Stochastic modeling is the common strategy to solve such problems. Multiple 3D models are generated from the available subsurface dataset. The generated models represent a sampling of plausible subsurface structure representations. From this model sampling, statistical analysis on targeted parameters (e.g.: reserve estimations, flow behaviors, etc.) and a posteriori uncertainties are performed to assess risks. However, on one hand, uncertainties may be huge, which requires many models to be generated for scanning the space of possibilities. On the other hand, some computations performed on the generated models are time consuming and cannot, in practice, be applied on all of them. This issue is particularly critical in: 1) geological modeling from outcrop data only, as these data types are generally sparse and mainly distributed in 2D at large scale but they may locally include high-resolution descriptions (e.g.: facies, strata local variability, etc.); 2) CO2 storage studies as many scales of investigations are required, from meter to regional ones, to estimate storage capacities and associated risks. Recent approaches propose to define distances between models to allow sophisticated multivariate statistics to be applied on the space of uncertainties so that only sub-samples, representative of initial set, are investigated for dynamic time-consuming studies. This work focuses on defining distances between models that characterize the topology of the reservoir rock network, i.e. its compactness or connectivity degree. The proposed strategy relies on the study of the reservoir rock skeleton. The skeleton of an object corresponds to its median feature. A skeleton is computed for each reservoir rock geobody and studied through a graph spectral analysis. To achieve this, the skeleton is converted into a graph structure. The spectral analysis applied on this graph structure allows a distance to be defined between pairs of graphs. Therefore, this distance is used as support for clustering analysis to gather models that share the same reservoir rock topology. To show the ability of the defined distances to discriminate different types of reservoir connectivity, a synthetic data set of fluvial models with different geological settings was generated and studied using the proposed approach. The results of the clustering analysis are shown and discussed.

  19. Ensemble Solar Forecasting Statistical Quantification and Sensitivity Analysis: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheung, WanYin; Zhang, Jie; Florita, Anthony

    2015-12-08

    Uncertainties associated with solar forecasts present challenges to maintain grid reliability, especially at high solar penetrations. This study aims to quantify the errors associated with the day-ahead solar forecast parameters and the theoretical solar power output for a 51-kW solar power plant in a utility area in the state of Vermont, U.S. Forecasts were generated by three numerical weather prediction (NWP) models, including the Rapid Refresh, the High Resolution Rapid Refresh, and the North American Model, and a machine-learning ensemble model. A photovoltaic (PV) performance model was adopted to calculate theoretical solar power generation using the forecast parameters (e.g., irradiance,more » cell temperature, and wind speed). Errors of the power outputs were quantified using statistical moments and a suite of metrics, such as the normalized root mean squared error (NRMSE). In addition, the PV model's sensitivity to different forecast parameters was quantified and analyzed. Results showed that the ensemble model yielded forecasts in all parameters with the smallest NRMSE. The NRMSE of solar irradiance forecasts of the ensemble NWP model was reduced by 28.10% compared to the best of the three NWP models. Further, the sensitivity analysis indicated that the errors of the forecasted cell temperature attributed only approximately 0.12% to the NRMSE of the power output as opposed to 7.44% from the forecasted solar irradiance.« less

  20. Spatial Modeling for Resources Framework (SMRF): A modular framework for developing spatial forcing data in mountainous terrain

    NASA Astrophysics Data System (ADS)

    Havens, S.; Marks, D. G.; Kormos, P.; Hedrick, A. R.; Johnson, M.; Robertson, M.; Sandusky, M.

    2017-12-01

    In the Western US, operational water supply managers rely on statistical techniques to forecast the volume of water left to enter the reservoirs. As the climate changes and the demand increases for stored water utilized for irrigation, flood control, power generation, and ecosystem services, water managers have begun to move from statistical techniques towards using physically based models. To assist with the transition, a new open source framework was developed, the Spatial Modeling for Resources Framework (SMRF), to automate and simplify the most common forcing data distribution methods. SMRF is computationally efficient and can be implemented for both research and operational applications. Currently, SMRF is able to generate all of the forcing data required to run physically based snow or hydrologic models at 50-100 m resolution over regions of 500-10,000 km2, and has been successfully applied in real time and historical applications for the Boise River Basin in Idaho, USA, the Tuolumne River Basin and San Joaquin in California, USA, and Reynolds Creek Experimental Watershed in Idaho, USA. These applications use meteorological station measurements and numerical weather prediction model outputs as input data. SMRF has significantly streamlined the modeling workflow, decreased model set up time from weeks to days, and made near real-time application of physics-based snow and hydrologic models possible.

  1. Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

    PubMed

    Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G

    2012-10-02

    Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.

  2. Statistically Qualified Neuro-Analytic system and Method for Process Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

    1998-11-04

    An apparatus and method for monitoring a process involves development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two steps: deterministic model adaption and stochastic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics,augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation emor minimization technique. Stochastic model adaptation involves qualifying any remaining uncertaintymore » in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system.« less

  3. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  4. Advances in Time Estimation Methods for Molecular Data.

    PubMed

    Kumar, Sudhir; Hedges, S Blair

    2016-04-01

    Molecular dating has become central to placing a temporal dimension on the tree of life. Methods for estimating divergence times have been developed for over 50 years, beginning with the proposal of molecular clock in 1962. We categorize the chronological development of these methods into four generations based on the timing of their origin. In the first generation approaches (1960s-1980s), a strict molecular clock was assumed to date divergences. In the second generation approaches (1990s), the equality of evolutionary rates between species was first tested and then a strict molecular clock applied to estimate divergence times. The third generation approaches (since ∼2000) account for differences in evolutionary rates across the tree by using a statistical model, obviating the need to assume a clock or to test the equality of evolutionary rates among species. Bayesian methods in the third generation require a specific or uniform prior on the speciation-process and enable the inclusion of uncertainty in clock calibrations. The fourth generation approaches (since 2012) allow rates to vary from branch to branch, but do not need prior selection of a statistical model to describe the rate variation or the specification of speciation model. With high accuracy, comparable to Bayesian approaches, and speeds that are orders of magnitude faster, fourth generation methods are able to produce reliable timetrees of thousands of species using genome scale data. We found that early time estimates from second generation studies are similar to those of third and fourth generation studies, indicating that methodological advances have not fundamentally altered the timetree of life, but rather have facilitated time estimation by enabling the inclusion of more species. Nonetheless, we feel an urgent need for testing the accuracy and precision of third and fourth generation methods, including their robustness to misspecification of priors in the analysis of large phylogenies and data sets. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  6. Adaptable state based control system

    NASA Technical Reports Server (NTRS)

    Rasmussen, Robert D. (Inventor); Dvorak, Daniel L. (Inventor); Gostelow, Kim P. (Inventor); Starbird, Thomas W. (Inventor); Gat, Erann (Inventor); Chien, Steve Ankuo (Inventor); Keller, Robert M. (Inventor)

    2004-01-01

    An autonomous controller, comprised of a state knowledge manager, a control executor, hardware proxies and a statistical estimator collaborates with a goal elaborator, with which it shares common models of the behavior of the system and the controller. The elaborator uses the common models to generate from temporally indeterminate sets of goals, executable goals to be executed by the controller. The controller may be updated to operate in a different system or environment than that for which it was originally designed by the replacement of shared statistical models and by the instantiation of a new set of state variable objects derived from a state variable class. The adaptation of the controller does not require substantial modification of the goal elaborator for its application to the new system or environment.

  7. Efficient Generation and Selection of Virtual Populations in Quantitative Systems Pharmacology Models.

    PubMed

    Allen, R J; Rieger, T R; Musante, C J

    2016-03-01

    Quantitative systems pharmacology models mechanistically describe a biological system and the effect of drug treatment on system behavior. Because these models rarely are identifiable from the available data, the uncertainty in physiological parameters may be sampled to create alternative parameterizations of the model, sometimes termed "virtual patients." In order to reproduce the statistics of a clinical population, virtual patients are often weighted to form a virtual population that reflects the baseline characteristics of the clinical cohort. Here we introduce a novel technique to efficiently generate virtual patients and, from this ensemble, demonstrate how to select a virtual population that matches the observed data without the need for weighting. This approach improves confidence in model predictions by mitigating the risk that spurious virtual patients become overrepresented in virtual populations.

  8. Dynamic model of the force driving kinesin to move along microtubule-Simulation with a model system

    NASA Astrophysics Data System (ADS)

    Chou, Y. C.; Hsiao, Yi-Feng; To, Kiwing

    2015-09-01

    A dynamic model for the motility of kinesin, including stochastic-force generation and step formation is proposed. The force driving the motion of kinesin motor is generated by the impulse from the collision between the randomly moving long-chain stalk and the ratchet-shaped outer surface of microtubule. Most of the dynamical and statistical features of the motility of kinesin are reproduced in a simulation system, with (a) ratchet structures similar to the outer surface of microtubule, (b) a bead chain connected to two heads, similarly to the stalk of the real kinesin motor, and (c) the interaction between the heads of the simulated kinesin and microtubule. We also propose an experiment to discriminate between the conventional hand-over-hand model and the dynamic model.

  9. Design, development, and evaluation of a second generation interactive Simulator for Engineering Ethics Education (SEEE2).

    PubMed

    Alfred, Michael; Chung, Christopher A

    2012-12-01

    This paper describes a second generation Simulator for Engineering Ethics Education. Details describing the first generation activities of this overall effort are published in Chung and Alfred (Sci Eng Ethics 15:189-199, 2009). The second generation research effort represents a major development in the interactive simulator educational approach. As with the first generation effort, the simulator places students in first person perspective scenarios involving different types of ethical situations. Students must still gather data, assess the situation, and make decisions. The approach still requires students to develop their own ability to identify and respond to ethical engineering situations. However, were as, the generation one effort involved the use of a dogmatic model based on National Society of Professional Engineers' Code of Ethics, the new generation two model is based on a mathematical model of the actual experiences of engineers involved in ethical situations. This approach also allows the use of feedback in the form of decision effectiveness and professional career impact. Statistical comparisons indicate a 59 percent increase in overall knowledge and a 19 percent improvement in teaching effectiveness over an Internet Engineering Ethics resource based approach.

  10. Statistical downscaling and future scenario generation of temperatures for Pakistan Region

    NASA Astrophysics Data System (ADS)

    Kazmi, Dildar Hussain; Li, Jianping; Rasul, Ghulam; Tong, Jiang; Ali, Gohar; Cheema, Sohail Babar; Liu, Luliu; Gemmer, Marco; Fischer, Thomas

    2015-04-01

    Finer climate change information on spatial scale is required for impact studies than that presently provided by global or regional climate models. It is especially true for regions like South Asia with complex topography, coastal or island locations, and the areas of highly heterogeneous land-cover. To deal with the situation, an inexpensive method (statistical downscaling) has been adopted. Statistical DownScaling Model (SDSM) employed for downscaling of daily minimum and maximum temperature data of 44 national stations for base time (1961-1990) and then the future scenarios generated up to 2099. Observed as well as Predictors (product of National Oceanic and Atmospheric Administration) data were calibrated and tested on individual/multiple basis through linear regression. Future scenario was generated based on HadCM3 daily data for A2 and B2 story lines. The downscaled data has been tested, and it has shown a relatively strong relationship with the observed in comparison to ECHAM5 data. Generally, the southern half of the country is considered vulnerable in terms of increasing temperatures, but the results of this study projects that in future, the northern belt in particular would have a possible threat of increasing tendency in air temperature. Especially, the northern areas (hosting the third largest ice reserves after the Polar Regions), an important feeding source for Indus River, are projected to be vulnerable in terms of increasing temperatures. Consequently, not only the hydro-agricultural sector but also the environmental conditions in the area may be at risk, in future.

  11. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  12. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  13. RAD-ADAPT: Software for modelling clonogenic assay data in radiation biology.

    PubMed

    Zhang, Yaping; Hu, Kaiqiang; Beumer, Jan H; Bakkenist, Christopher J; D'Argenio, David Z

    2017-04-01

    We present a comprehensive software program, RAD-ADAPT, for the quantitative analysis of clonogenic assays in radiation biology. Two commonly used models for clonogenic assay analysis, the linear-quadratic model and single-hit multi-target model, are included in the software. RAD-ADAPT uses maximum likelihood estimation method to obtain parameter estimates with the assumption that cell colony count data follow a Poisson distribution. The program has an intuitive interface, generates model prediction plots, tabulates model parameter estimates, and allows automatic statistical comparison of parameters between different groups. The RAD-ADAPT interface is written using the statistical software R and the underlying computations are accomplished by the ADAPT software system for pharmacokinetic/pharmacodynamic systems analysis. The use of RAD-ADAPT is demonstrated using an example that examines the impact of pharmacologic ATM and ATR kinase inhibition on human lung cancer cell line A549 after ionizing radiation. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Linking Statistically- and Physically-Based Models for Improved Streamflow Simulation in Gaged and Ungaged Areas

    NASA Astrophysics Data System (ADS)

    Lafontaine, J.; Hay, L.; Archfield, S. A.; Farmer, W. H.; Kiang, J. E.

    2014-12-01

    The U.S. Geological Survey (USGS) has developed a National Hydrologic Model (NHM) to support coordinated, comprehensive and consistent hydrologic model development, and facilitate the application of hydrologic simulations within the continental US. The portion of the NHM located within the Gulf Coastal Plains and Ozarks Landscape Conservation Cooperative (GCPO LCC) is being used to test the feasibility of improving streamflow simulations in gaged and ungaged watersheds by linking statistically- and physically-based hydrologic models. The GCPO LCC covers part or all of 12 states and 5 sub-geographies, totaling approximately 726,000 km2, and is centered on the lower Mississippi Alluvial Valley. A total of 346 USGS streamgages in the GCPO LCC region were selected to evaluate the performance of this new calibration methodology for the period 1980 to 2013. Initially, the physically-based models are calibrated to measured streamflow data to provide a baseline for comparison. An enhanced calibration procedure then is used to calibrate the physically-based models in the gaged and ungaged areas of the GCPO LCC using statistically-based estimates of streamflow. For this application, the calibration procedure is adjusted to address the limitations of the statistically generated time series to reproduce measured streamflow in gaged basins, primarily by incorporating error and bias estimates. As part of this effort, estimates of uncertainty in the model simulations are also computed for the gaged and ungaged watersheds.

  15. Development of a model of the coronary arterial tree for the 4D XCAT phantom

    NASA Astrophysics Data System (ADS)

    Fung, George S. K.; Segars, W. Paul; Gullberg, Grant T.; Tsui, Benjamin M. W.

    2011-09-01

    A detailed three-dimensional (3D) model of the coronary artery tree with cardiac motion has great potential for applications in a wide variety of medical imaging research areas. In this work, we first developed a computer-generated 3D model of the coronary arterial tree for the heart in the extended cardiac-torso (XCAT) phantom, thereby creating a realistic computer model of the human anatomy. The coronary arterial tree model was based on two datasets: (1) a gated cardiac dual-source computed tomography (CT) angiographic dataset obtained from a normal human subject and (2) statistical morphometric data of porcine hearts. The initial proximal segments of the vasculature and the anatomical details of the boundaries of the ventricles were defined by segmenting the CT data. An iterative rule-based generation method was developed and applied to extend the coronary arterial tree beyond the initial proximal segments. The algorithm was governed by three factors: (1) statistical morphometric measurements of the connectivity, lengths and diameters of the arterial segments; (2) avoidance forces from other vessel segments and the boundaries of the myocardium, and (3) optimality principles which minimize the drag force at the bifurcations of the generated tree. Using this algorithm, the 3D computational model of the largest six orders of the coronary arterial tree was generated, which spread across the myocardium of the left and right ventricles. The 3D coronary arterial tree model was then extended to 4D to simulate different cardiac phases by deforming the original 3D model according to the motion vector map of the 4D cardiac model of the XCAT phantom at the corresponding phases. As a result, a detailed and realistic 4D model of the coronary arterial tree was developed for the XCAT phantom by imposing constraints of anatomical and physiological characteristics of the coronary vasculature. This new 4D coronary artery tree model provides a unique simulation tool that can be used in the development and evaluation of instrumentation and methods for imaging normal and pathological hearts with myocardial perfusion defects.

  16. Big data to smart data in Alzheimer's disease: Real-world examples of advanced modeling and simulation.

    PubMed

    Haas, Magali; Stephenson, Diane; Romero, Klaus; Gordon, Mark Forrest; Zach, Neta; Geerts, Hugo

    2016-09-01

    Many disease-modifying clinical development programs in Alzheimer's disease (AD) have failed to date, and development of new and advanced preclinical models that generate actionable knowledge is desperately needed. This review reports on computer-based modeling and simulation approach as a powerful tool in AD research. Statistical data-analysis techniques can identify associations between certain data and phenotypes, such as diagnosis or disease progression. Other approaches integrate domain expertise in a formalized mathematical way to understand how specific components of pathology integrate into complex brain networks. Private-public partnerships focused on data sharing, causal inference and pathway-based analysis, crowdsourcing, and mechanism-based quantitative systems modeling represent successful real-world modeling examples with substantial impact on CNS diseases. Similar to other disease indications, successful real-world examples of advanced simulation can generate actionable support of drug discovery and development in AD, illustrating the value that can be generated for different stakeholders. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  18. Development of uncertainty-based work injury model using Bayesian structural equation modelling.

    PubMed

    Chatterjee, Snehamoy

    2014-01-01

    This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.

  19. Generalized background error covariance matrix model (GEN_BE v2.0)

    NASA Astrophysics Data System (ADS)

    Descombes, G.; Auligné, T.; Vandenberghe, F.; Barker, D. M.; Barré, J.

    2015-03-01

    The specification of state background error statistics is a key component of data assimilation since it affects the impact observations will have on the analysis. In the variational data assimilation approach, applied in geophysical sciences, the dimensions of the background error covariance matrix (B) are usually too large to be explicitly determined and B needs to be modeled. Recent efforts to include new variables in the analysis such as cloud parameters and chemical species have required the development of the code to GENerate the Background Errors (GEN_BE) version 2.0 for the Weather Research and Forecasting (WRF) community model. GEN_BE allows for a simpler, flexible, robust, and community-oriented framework that gathers methods used by some meteorological operational centers and researchers. We present the advantages of this new design for the data assimilation community by performing benchmarks of different modeling of B and showing some of the new features in data assimilation test cases. As data assimilation for clouds remains a challenge, we present a multivariate approach that includes hydrometeors in the control variables and new correlated errors. In addition, the GEN_BE v2.0 code is employed to diagnose error parameter statistics for chemical species, which shows that it is a tool flexible enough to implement new control variables. While the generation of the background errors statistics code was first developed for atmospheric research, the new version (GEN_BE v2.0) can be easily applied to other domains of science and chosen to diagnose and model B. Initially developed for variational data assimilation, the model of the B matrix may be useful for variational ensemble hybrid methods as well.

  20. Implementation and Research on the Operational Use of the Mesoscale Prediction Model COAMPS in Poland

    DTIC Science & Technology

    2007-09-30

    COAMPS model. Bogumil Jakubiak, University of Warsaw – participated in EGU General Assembly , Vienna Austria 15-20 April 2007 giving one oral and two...conditional forecast (background) error probability density function using an ensemble of the model forecast to generate background error statistics...COAMPS system on ICM machines at Warsaw University for the purpose of providing operational support to the general public using the ICM meteorological

  1. Analysis and generation of groundwater concentration time series

    NASA Astrophysics Data System (ADS)

    Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae

    2018-01-01

    Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.

  2. Statistical analysis and modeling of intermittent transport events in the tokamak scrape-off layer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Johan, E-mail: anderson.johan@gmail.com; Halpern, Federico D.; Ricci, Paolo

    The turbulence observed in the scrape-off-layer of a tokamak is often characterized by intermittent events of bursty nature, a feature which raises concerns about the prediction of heat loads on the physical boundaries of the device. It appears thus necessary to delve into the statistical properties of turbulent physical fields such as density, electrostatic potential, and temperature, focusing on the mathematical expression of tails of the probability distribution functions. The method followed here is to generate statistical information from time-traces of the plasma density stemming from Braginskii-type fluid simulations and check this against a first-principles theoretical model. The analysis ofmore » the numerical simulations indicates that the probability distribution function of the intermittent process contains strong exponential tails, as predicted by the analytical theory.« less

  3. Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

    NASA Astrophysics Data System (ADS)

    Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

    2015-09-01

    Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.

  4. A Three Dimensional Kinematic and Kinetic Study of the Golf Swing

    PubMed Central

    Nesbit, Steven M.

    2005-01-01

    This paper discusses the three-dimensional kinematics and kinetics of a golf swing as performed by 84 male and one female amateur subjects of various skill levels. The analysis was performed using a variable full-body computer model of a human coupled with a flexible model of a golf club. Data to drive the model was obtained from subject swings recorded using a multi-camera motion analysis system. Model output included club trajectories, golfer/club interaction forces and torques, work and power, and club deflections. These data formed the basis for a statistical analysis of all subjects, and a detailed analysis and comparison of the swing characteristics of four of the subjects. The analysis generated much new data concerning the mechanics of the golf swing. It revealed that a golf swing is a highly coordinated and individual motion and subject-to-subject variations were significant. The study highlighted the importance of the wrists in generating club head velocity and orienting the club face. The trajectory of the hands and the ability to do work were the factors most closely related to skill level. Key Points Full-body model of the golf swing. Mechanical description of the golf swing. Statistical analysis of golf swing mechanics. Comparisons of subject swing mechanics PMID:24627665

  5. A three dimensional kinematic and kinetic study of the golf swing.

    PubMed

    Nesbit, Steven M

    2005-12-01

    This paper discusses the three-dimensional kinematics and kinetics of a golf swing as performed by 84 male and one female amateur subjects of various skill levels. The analysis was performed using a variable full-body computer model of a human coupled with a flexible model of a golf club. Data to drive the model was obtained from subject swings recorded using a multi-camera motion analysis system. Model output included club trajectories, golfer/club interaction forces and torques, work and power, and club deflections. These data formed the basis for a statistical analysis of all subjects, and a detailed analysis and comparison of the swing characteristics of four of the subjects. The analysis generated much new data concerning the mechanics of the golf swing. It revealed that a golf swing is a highly coordinated and individual motion and subject-to-subject variations were significant. The study highlighted the importance of the wrists in generating club head velocity and orienting the club face. The trajectory of the hands and the ability to do work were the factors most closely related to skill level. Key PointsFull-body model of the golf swing.Mechanical description of the golf swing.Statistical analysis of golf swing mechanics.Comparisons of subject swing mechanics.

  6. Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

    PubMed

    Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

    2016-05-01

    Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.

  7. Research on cloud background infrared radiation simulation based on fractal and statistical data

    NASA Astrophysics Data System (ADS)

    Liu, Xingrun; Xu, Qingshan; Li, Xia; Wu, Kaifeng; Dong, Yanbing

    2018-02-01

    Cloud is an important natural phenomenon, and its radiation causes serious interference to infrared detector. Based on fractal and statistical data, a method is proposed to realize cloud background simulation, and cloud infrared radiation data field is assigned using satellite radiation data of cloud. A cloud infrared radiation simulation model is established using matlab, and it can generate cloud background infrared images for different cloud types (low cloud, middle cloud, and high cloud) in different months, bands and sensor zenith angles.

  8. Statistical tests of simple earthquake cycle models

    NASA Astrophysics Data System (ADS)

    DeVries, Phoebe M. R.; Evans, Eileen L.

    2016-12-01

    A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM < 4.0 × 1019 Pa s and ηM > 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.

  9. Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders.

    PubMed

    Kupek, Emil

    2006-03-15

    Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.

  10. How many studies are necessary to compare niche-based models for geographic distributions? Inductive reasoning may fail at the end.

    PubMed

    Terribile, L C; Diniz-Filho, J A F; De Marco, P

    2010-05-01

    The use of ecological niche models (ENM) to generate potential geographic distributions of species has rapidly increased in ecology, conservation and evolutionary biology. Many methods are available and the most used are Maximum Entropy Method (MAXENT) and the Genetic Algorithm for Rule Set Production (GARP). Recent studies have shown that MAXENT perform better than GARP. Here we used the statistics methods of ROC - AUC (area under the Receiver Operating Characteristics curve) and bootstrap to evaluate the performance of GARP and MAXENT in generate potential distribution models for 39 species of New World coral snakes. We found that values of AUC for GARP ranged from 0.923 to 0.999, whereas those for MAXENT ranged from 0.877 to 0.999. On the whole, the differences in AUC were very small, but for 10 species GARP outperformed MAXENT. Means and standard deviations for 100 bootstrapped samples with sample sizes ranging from 3 to 30 species did not show any trends towards deviations from a zero difference in AUC values of GARP minus AUC values of MAXENT. Ours results suggest that further studies are still necessary to establish under which circumstances the statistical performance of the methods vary. However, it is also important to consider the possibility that this empirical inductive reasoning may fail in the end, because we almost certainly could not establish all potential scenarios generating variation in the relative performance of models.

  11. A Study on Grid-Square Statistics Based Estimation of Regional Electricity Demand and Regional Potential Capacity of Distributed Generators

    NASA Astrophysics Data System (ADS)

    Kato, Takeyoshi; Sugimoto, Hiroyuki; Suzuoki, Yasuo

    We established a procedure for estimating regional electricity demand and regional potential capacity of distributed generators (DGs) by using a grid square statistics data set. A photovoltaic power system (PV system) for residential use and a co-generation system (CGS) for both residential and commercial use were taken into account. As an example, the result regarding Aichi prefecture was presented in this paper. The statistical data of the number of households by family-type and the number of employees by business category for about 4000 grid-square with 1km × 1km area was used to estimate the floor space or the electricity demand distribution. The rooftop area available for installing PV systems was also estimated with the grid-square statistics data set. Considering the relation between a capacity of existing CGS and a scale-index of building where CGS is installed, the potential capacity of CGS was estimated for three business categories, i.e. hotel, hospital, store. In some regions, the potential capacity of PV systems was estimated to be about 10,000kW/km2, which corresponds to the density of the existing area with intensive installation of PV systems. Finally, we discussed the ratio of regional potential capacity of DGs to regional maximum electricity demand for deducing the appropriate capacity of DGs in the model of future electricity distribution system.

  12. A short-term ensemble wind speed forecasting system for wind power applications

    NASA Astrophysics Data System (ADS)

    Baidya Roy, S.; Traiteur, J. J.; Callicutt, D.; Smith, M.

    2011-12-01

    This study develops an adaptive, blended forecasting system to provide accurate wind speed forecasts 1 hour ahead of time for wind power applications. The system consists of an ensemble of 21 forecasts with different configurations of the Weather Research and Forecasting Single Column Model (WRFSCM) and a persistence model. The ensemble is calibrated against observations for a 2 month period (June-July, 2008) at a potential wind farm site in Illinois using the Bayesian Model Averaging (BMA) technique. The forecasting system is evaluated against observations for August 2008 at the same site. The calibrated ensemble forecasts significantly outperform the forecasts from the uncalibrated ensemble while significantly reducing forecast uncertainty under all environmental stability conditions. The system also generates significantly better forecasts than persistence, autoregressive (AR) and autoregressive moving average (ARMA) models during the morning transition and the diurnal convective regimes. This forecasting system is computationally more efficient than traditional numerical weather prediction models and can generate a calibrated forecast, including model runs and calibration, in approximately 1 minute. Currently, hour-ahead wind speed forecasts are almost exclusively produced using statistical models. However, numerical models have several distinct advantages over statistical models including the potential to provide turbulence forecasts. Hence, there is an urgent need to explore the role of numerical models in short-term wind speed forecasting. This work is a step in that direction and is likely to trigger a debate within the wind speed forecasting community.

  13. The optimization of high resolution topographic data for 1D hydrodynamic models

    NASA Astrophysics Data System (ADS)

    Ales, Ronovsky; Michal, Podhoranyi

    2016-06-01

    The main focus of our research presented in this paper is to optimize and use high resolution topographical data (HRTD) for hydrological modelling. Optimization of HRTD is done by generating adaptive mesh by measuring distance of coarse mesh and the surface of the dataset and adapting the mesh from the perspective of keeping the geometry as close to initial resolution as possible. Technique described in this paper enables computation of very accurate 1-D hydrodynamic models. In the paper, we use HEC-RAS software as a solver. For comparison, we have chosen the amount of generated cells/grid elements (in whole discretization domain and selected cross sections) with respect to preservation of the accuracy of the computational domain. Generation of the mesh for hydrodynamic modelling is strongly reliant on domain size and domain resolution. Topographical dataset used in this paper was created using LiDAR method and it captures 5.9km long section of a catchment of the river Olše. We studied crucial changes in topography for generated mesh. Assessment was done by commonly used statistical and visualization methods.

  14. Comparison of algorithms to generate event times conditional on time-dependent covariates.

    PubMed

    Sylvestre, Marie-Pierre; Abrahamowicz, Michal

    2008-06-30

    The Cox proportional hazards model with time-dependent covariates (TDC) is now a part of the standard statistical analysis toolbox in medical research. As new methods involving more complex modeling of time-dependent variables are developed, simulations could often be used to systematically assess the performance of these models. Yet, generating event times conditional on TDC requires well-designed and efficient algorithms. We compare two classes of such algorithms: permutational algorithms (PAs) and algorithms based on a binomial model. We also propose a modification of the PA to incorporate a rejection sampler. We performed a simulation study to assess the accuracy, stability, and speed of these algorithms in several scenarios. Both classes of algorithms generated data sets that, once analyzed, provided virtually unbiased estimates with comparable variances. In terms of computational efficiency, the PA with the rejection sampler reduced the time necessary to generate data by more than 50 per cent relative to alternative methods. The PAs also allowed more flexibility in the specification of the marginal distributions of event times and required less calibration.

  15. The optimization of high resolution topographic data for 1D hydrodynamic models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ales, Ronovsky, E-mail: ales.ronovsky@vsb.cz; Michal, Podhoranyi

    2016-06-08

    The main focus of our research presented in this paper is to optimize and use high resolution topographical data (HRTD) for hydrological modelling. Optimization of HRTD is done by generating adaptive mesh by measuring distance of coarse mesh and the surface of the dataset and adapting the mesh from the perspective of keeping the geometry as close to initial resolution as possible. Technique described in this paper enables computation of very accurate 1-D hydrodynamic models. In the paper, we use HEC-RAS software as a solver. For comparison, we have chosen the amount of generated cells/grid elements (in whole discretization domainmore » and selected cross sections) with respect to preservation of the accuracy of the computational domain. Generation of the mesh for hydrodynamic modelling is strongly reliant on domain size and domain resolution. Topographical dataset used in this paper was created using LiDAR method and it captures 5.9km long section of a catchment of the river Olše. We studied crucial changes in topography for generated mesh. Assessment was done by commonly used statistical and visualization methods.« less

  16. Statistical hadronization and microcanonical ensemble

    DOE PAGES

    Becattini, F.; Ferroni, L.

    2004-01-01

    We present a Monte Carlo calculation of the microcanonical ensemble of the of the ideal hadron-resonance gas including all known states up to a mass of 1. 8 GeV, taking into account quantum statistics. The computing method is a development of a previous one based on a Metropolis Monte Carlo algorithm, with a the grand-canonical limit of the multi-species multiplicity distribution as proposal matrix. The microcanonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy. This algorithm opens the way for event generators based for themore » statistical hadronization model.« less

  17. Mine Burial Expert System for Change of MIW Doctrine

    DTIC Science & Technology

    2011-09-01

    allowed the mine to move vertically and horizontally, as well as rotate about the y axis. The first of these second generation impact models was...bearing strength and use multilayered sediments. Although they improve the knowledge of mine movement in two dimensions and rotation in one direction...conditional independence. Bayesian networks were originally developed 24 to handle uncertainty in a quantitative manner. They are statistical models

  18. A MULTIPLE TESTING OF THE ABC METHOD AND THE DEVELOPMENT OF A SECOND-GENERATION MODEL. PART II, TEST RESULTS AND AN ANALYSIS OF RECALL RATIO.

    ERIC Educational Resources Information Center

    ALTMANN, BERTHOLD

    AFTER A BRIEF SUMMARY OF THE TEST PROGRAM (DESCRIBED MORE FULLY IN LI 000 318), THE STATISTICAL RESULTS TABULATED AS OVERALL "ABC (APPROACH BY CONCEPT)-RELEVANCE RATIOS" AND "ABC-RECALL FIGURES" ARE PRESENTED AND REVIEWED. AN ABSTRACT MODEL DEVELOPED IN ACCORDANCE WITH MAX WEBER'S "IDEALTYPUS" ("DIE OBJEKTIVITAET…

  19. Modeling property evolution of container materials used in nuclear waste storage

    NASA Astrophysics Data System (ADS)

    Li, Dongsheng; Garmestani, Hamid; Khaleel, Moe; Sun, Xin

    2010-03-01

    Container materials under irradiation for a long time will raise high energy in the structure to generate critical structural damage. This study investigated what kind of mesoscale microstructure will be more resistant to radiation damage. Mechanical properties evolution during irradiation was modeled using statistical continuum mechanics. Preliminary results also showed how to achieve the desired microstructure with higher resistance to radiation.

  20. Virtual reality and consciousness inference in dreaming

    PubMed Central

    Hobson, J. Allan; Hong, Charles C.-H.; Friston, Karl J.

    2014-01-01

    This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that – through experience-dependent plasticity – becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep – and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain’s generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis – evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research. PMID:25346710

  1. Virtual reality and consciousness inference in dreaming.

    PubMed

    Hobson, J Allan; Hong, Charles C-H; Friston, Karl J

    2014-01-01

    This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research.

  2. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  3. Testing Pairwise Association between Spatially Autocorrelated Variables: A New Approach Using Surrogate Lattice Data

    PubMed Central

    Deblauwe, Vincent; Kennel, Pol; Couteron, Pierre

    2012-01-01

    Background Independence between observations is a standard prerequisite of traditional statistical tests of association. This condition is, however, violated when autocorrelation is present within the data. In the case of variables that are regularly sampled in space (i.e. lattice data or images), such as those provided by remote-sensing or geographical databases, this problem is particularly acute. Because analytic derivation of the null probability distribution of the test statistic (e.g. Pearson's r) is not always possible when autocorrelation is present, we propose instead the use of a Monte Carlo simulation with surrogate data. Methodology/Principal Findings The null hypothesis that two observed mapped variables are the result of independent pattern generating processes is tested here by generating sets of random image data while preserving the autocorrelation function of the original images. Surrogates are generated by matching the dual-tree complex wavelet spectra (and hence the autocorrelation functions) of white noise images with the spectra of the original images. The generated images can then be used to build the probability distribution function of any statistic of association under the null hypothesis. We demonstrate the validity of a statistical test of association based on these surrogates with both actual and synthetic data and compare it with a corrected parametric test and three existing methods that generate surrogates (randomization, random rotations and shifts, and iterative amplitude adjusted Fourier transform). Type I error control was excellent, even with strong and long-range autocorrelation, which is not the case for alternative methods. Conclusions/Significance The wavelet-based surrogates are particularly appropriate in cases where autocorrelation appears at all scales or is direction-dependent (anisotropy). We explore the potential of the method for association tests involving a lattice of binary data and discuss its potential for validation of species distribution models. An implementation of the method in Java for the generation of wavelet-based surrogates is available online as supporting material. PMID:23144961

  4. Influence of El Niño Southern Oscillation on global hydropower production

    NASA Astrophysics Data System (ADS)

    Ng, Jia Yi; Turner, Sean W. D.; Galelli, Stefano

    2017-03-01

    El Niño Southern Oscillation (ENSO) strongly influences the global climate system, affecting hydrology in many of the world’s river basins. This raises the prospect of ENSO-driven variability in global and regional hydroelectric power generation. Here we study these effects by generating time series of power production for 1593 hydropower dams, which collectively represent more than half of the world’s existing installed hydropower capacity. The time series are generated by forcing a detailed dam model with monthly-resolution, 20th century inflows—the model includes plant specifications, storage dynamics and realistic operating schemes, and runs irrespectively of the dam construction year. More than one third of simulated dams exhibit statistically significant annual energy production anomalies in at least one of the two ENSO phases of El Niño and La Niña. For most dams, the variability of relative anomalies in power production tends to be less than that of the forcing inflows—a consequence of dam design specifications, namely maximum turbine release rate and reservoir storage, which allows inflows to accumulate for power generation in subsequent dry years. Production is affected most prominently in Northwest United States, South America, Central America, the Iberian Peninsula, Southeast Asia and Southeast Australia. When aggregated globally, positive and negative energy production anomalies effectively cancel each other out, resulting in a weak and statistically insignificant net global anomaly for both ENSO phases.

  5. Development and application of GIS-based PRISM integration through a plugin approach

    NASA Astrophysics Data System (ADS)

    Lee, Woo-Seop; Chun, Jong Ahn; Kang, Kwangmin

    2014-05-01

    A PRISM (Parameter-elevation Regressions on Independent Slopes Model) QGIS-plugin was developed on Quantum GIS platform in this study. This Quantum GIS plugin system provides user-friendly graphic user interfaces (GUIs) so that users can obtain gridded meteorological data of high resolutions (1 km × 1 km). Also, this software is designed to run on a personal computer so that it does not require an internet access or a sophisticated computer system. This module is a user-friendly system that a user can generate PRISM data with ease. The proposed PRISM QGIS-plugin is a hybrid statistical-geographic model system that uses coarse resolution datasets (APHRODITE datasets in this study) with digital elevation data to generate the fine-resolution gridded precipitation. To validate the performance of the software, Prek Thnot River Basin in Kandal, Cambodia is selected for application. Overall statistical analysis shows promising outputs generated by the proposed plugin. Error measures such as RMSE (Root Mean Square Error) and MAPE (Mean Absolute Percentage Error) were used to evaluate the performance of the developed PRISM QGIS-plugin. Evaluation results using RMSE and MAPE were 2.76 mm and 4.2%, respectively. This study suggested that the plugin can be used to generate high resolution precipitation datasets for hydrological and climatological studies at a watershed where observed weather datasets are limited.

  6. Implications of a electroweak triplet scalar leptoquark on the ultra-high energy neutrino events at IceCube

    NASA Astrophysics Data System (ADS)

    Mileo, Nicolas; de la Puente, Alejandro; Szynkman, Alejandro

    2016-11-01

    We study the production of scalar leptoquarks at IceCube, in particular, a particle transforming as a triplet under the weak interaction. The existence of electroweak-triplet scalars is highly motivated by models of grand unification and also within radiative seesaw models for neutrino mass generation. In our framework, we extend the Standard Model by a single colored electroweak-triplet scalar leptoquark and analyze its implications on the excess of ultra-high energy neutrino events observed by the IceCube collaboration. We consider only couplings between the leptoquark to first generation of quarks and first and second generations of leptons, and carry out a statistical analysis to determine the parameters that best describe the IceCube data as well as set 95% CL upper bounds. We analyze whether this study is still consistent with most up-to-date LHC data and various low energy observables.

  7. Guidelines for Genome-Scale Analysis of Biological Rhythms.

    PubMed

    Hughes, Michael E; Abruzzi, Katherine C; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M Fernanda; Chen, Zheng; Chiu, Joanna C; Cox, Juergen; Crowell, Alexander M; DeBruyne, Jason P; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J; Duffield, Giles E; Dunlap, Jay C; Eckel-Mahan, Kristin; Esser, Karyn A; FitzGerald, Garret A; Forger, Daniel B; Francey, Lauren J; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H; Herzel, Hanspeter; Herzog, Erik D; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J; Hurley, Jennifer M; de la Iglesia, Horacio O; Johnson, Carl; Kay, Steve A; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A; Li, Jiajia; Li, Xiaodong; Liu, Andrew C; Loros, Jennifer J; Martino, Tami A; Menet, Jerome S; Merrow, Martha; Millar, Andrew J; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N; Olmedo, Maria; Nusinow, Dmitri A; Ptáček, Louis J; Rand, David; Reddy, Akhilesh B; Robles, Maria S; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D; Rund, Samuel S C; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J; Storch, Kai-Florian; Takahashi, Joseph S; Ueda, Hiroki R; Wang, Han; Weitz, Charles; Westermark, Pål O; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B

    2017-10-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.

  8. Guidelines for Genome-Scale Analysis of Biological Rhythms

    PubMed Central

    Hughes, Michael E.; Abruzzi, Katherine C.; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M. Fernanda; Chen, Zheng; Chiu, Joanna C.; Cox, Juergen; Crowell, Alexander M.; DeBruyne, Jason P.; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J.; Duffield, Giles E.; Dunlap, Jay C.; Eckel-Mahan, Kristin; Esser, Karyn A.; FitzGerald, Garret A.; Forger, Daniel B.; Francey, Lauren J.; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S.; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H.; Herzel, Hanspeter; Herzog, Erik D.; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J.; Hurley, Jennifer M.; de la Iglesia, Horacio O.; Johnson, Carl; Kay, Steve A.; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A.; Li, Jiajia; Li, Xiaodong; Liu, Andrew C.; Loros, Jennifer J.; Martino, Tami A.; Menet, Jerome S.; Merrow, Martha; Millar, Andrew J.; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N.; Olmedo, Maria; Nusinow, Dmitri A.; Ptáček, Louis J.; Rand, David; Reddy, Akhilesh B.; Robles, Maria S.; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D.; Rund, Samuel S.C.; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J.; Storch, Kai-Florian; Takahashi, Joseph S.; Ueda, Hiroki R.; Wang, Han; Weitz, Charles; Westermark, Pål O.; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B.

    2017-01-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding “big data” that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them. PMID:29098954

  9. Relationship between accident severity and full-scale crash test. Volume II, Appendices

    DOT National Transportation Integrated Search

    1984-08-01

    Available accident files are used to generate a 4l2-accident data base of guardrail impacts. This base is analyzed to develop a statistical model for predicting accident severity index (ASI) as a function of vehicle type or weight, impact speed, and ...

  10. Integration of Technology into the Classroom: Case Studies.

    ERIC Educational Resources Information Center

    Johnson, D. LaMont, Ed.; Maddux, Cleborne D., Ed.; Liu, Leping, Ed.

    This book contains the following case studies on the integration of technology in education: (1) "First Steps toward a Statistically Generated Information Technology Integration Model" (D. LaMont Johnson and Leping Liu); (2) "Case Studies: Are We Rejecting Rigor or Rediscovering Richness?" (Cleborne D. Maddux); (3)…

  11. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Installation at IMD July, 2011 - IMD in New Delhi, India The Hurricane Weather Research and Forecasting (HWRF ) 5830 University Research Court College Park, MD 20740 Page Author: EMC Webmaster Page generated:Sunday

  12. Statistical analysis of large simulated yield datasets for studying climate effects

    USDA-ARS?s Scientific Manuscript database

    Ensembles of process-based crop models are now commonly used to simulate crop growth and development for climate scenarios of temperature and/or precipitation changes corresponding to different projections of atmospheric CO2 concentrations. This approach generates large datasets with thousands of de...

  13. Multivariate space - time analysis of PRE-STORM precipitation

    NASA Technical Reports Server (NTRS)

    Polyak, Ilya; North, Gerald R.; Valdes, Juan B.

    1994-01-01

    This paper presents the methodologies and results of the multivariate modeling and two-dimensional spectral and correlation analysis of PRE-STORM rainfall gauge data. Estimated parameters of the models for the specific spatial averages clearly indicate the eastward and southeastward wave propagation of rainfall fluctuations. A relationship between the coefficients of the diffusion equation and the parameters of the stochastic model of rainfall fluctuations is derived that leads directly to the exclusive use of rainfall data to estimate advection speed (about 12 m/s) as well as other coefficients of the diffusion equation of the corresponding fields. The statistical methodology developed here can be used for confirmation of physical models by comparison of the corresponding second-moment statistics of the observed and simulated data, for generating multiple samples of any size, for solving the inverse problem of the hydrodynamic equations, and for application in some other areas of meteorological and climatological data analysis and modeling.

  14. A method to encapsulate model structural uncertainty in ensemble projections of future climate: EPIC v1.0

    NASA Astrophysics Data System (ADS)

    Lewis, Jared; Bodeker, Greg E.; Kremser, Stefanie; Tait, Andrew

    2017-12-01

    A method, based on climate pattern scaling, has been developed to expand a small number of projections of fields of a selected climate variable (X) into an ensemble that encapsulates a wide range of indicative model structural uncertainties. The method described in this paper is referred to as the Ensemble Projections Incorporating Climate model uncertainty (EPIC) method. Each ensemble member is constructed by adding contributions from (1) a climatology derived from observations that represents the time-invariant part of the signal; (2) a contribution from forced changes in X, where those changes can be statistically related to changes in global mean surface temperature (Tglobal); and (3) a contribution from unforced variability that is generated by a stochastic weather generator. The patterns of unforced variability are also allowed to respond to changes in Tglobal. The statistical relationships between changes in X (and its patterns of variability) and Tglobal are obtained in a training phase. Then, in an implementation phase, 190 simulations of Tglobal are generated using a simple climate model tuned to emulate 19 different global climate models (GCMs) and 10 different carbon cycle models. Using the generated Tglobal time series and the correlation between the forced changes in X and Tglobal, obtained in the training phase, the forced change in the X field can be generated many times using Monte Carlo analysis. A stochastic weather generator is used to generate realistic representations of weather which include spatial coherence. Because GCMs and regional climate models (RCMs) are less likely to correctly represent unforced variability compared to observations, the stochastic weather generator takes as input measures of variability derived from observations, but also responds to forced changes in climate in a way that is consistent with the RCM projections. This approach to generating a large ensemble of projections is many orders of magnitude more computationally efficient than running multiple GCM or RCM simulations. Such a large ensemble of projections permits a description of a probability density function (PDF) of future climate states rather than a small number of individual story lines within that PDF, which may not be representative of the PDF as a whole; the EPIC method largely corrects for such potential sampling biases. The method is useful for providing projections of changes in climate to users wishing to investigate the impacts and implications of climate change in a probabilistic way. A web-based tool, using the EPIC method to provide probabilistic projections of changes in daily maximum and minimum temperatures for New Zealand, has been developed and is described in this paper.

  15. Selection of climate change scenario data for impact modelling.

    PubMed

    Sloth Madsen, M; Maule, C Fox; MacKellar, N; Olesen, J E; Christensen, J Hesselbjerg

    2012-01-01

    Impact models investigating climate change effects on food safety often need detailed climate data. The aim of this study was to select climate change projection data for selected crop phenology and mycotoxin impact models. Using the ENSEMBLES database of climate model output, this study illustrates how the projected climate change signal of important variables as temperature, precipitation and relative humidity depends on the choice of the climate model. Using climate change projections from at least two different climate models is recommended to account for model uncertainty. To make the climate projections suitable for impact analysis at the local scale a weather generator approach was adopted. As the weather generator did not treat all the necessary variables, an ad-hoc statistical method was developed to synthesise realistic values of missing variables. The method is presented in this paper, applied to relative humidity, but it could be adopted to other variables if needed.

  16. Stochastic Modeling and Generation of Partially Polarized or Partially Coherent Electromagnetic Waves

    NASA Technical Reports Server (NTRS)

    Davis, Brynmor; Kim, Edward; Piepmeier, Jeffrey; Hildebrand, Peter H. (Technical Monitor)

    2001-01-01

    Many new Earth remote-sensing instruments are embracing both the advantages and added complexity that result from interferometric or fully polarimetric operation. To increase instrument understanding and functionality a model of the signals these instruments measure is presented. A stochastic model is used as it recognizes the non-deterministic nature of any real-world measurements while also providing a tractable mathematical framework. A stationary, Gaussian-distributed model structure is proposed. Temporal and spectral correlation measures provide a statistical description of the physical properties of coherence and polarization-state. From this relationship the model is mathematically defined. The model is shown to be unique for any set of physical parameters. A method of realizing the model (necessary for applications such as synthetic calibration-signal generation) is given and computer simulation results are presented. The signals are constructed using the output of a multi-input multi-output linear filter system, driven with white noise.

  17. The epistemological status of general circulation models

    NASA Astrophysics Data System (ADS)

    Loehle, Craig

    2018-03-01

    Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.

  18. Sandpile-based model for capturing magnitude distributions and spatiotemporal clustering and separation in regional earthquakes

    NASA Astrophysics Data System (ADS)

    Batac, Rene C.; Paguirigan, Antonino A., Jr.; Tarun, Anjali B.; Longjas, Anthony G.

    2017-04-01

    We propose a cellular automata model for earthquake occurrences patterned after the sandpile model of self-organized criticality (SOC). By incorporating a single parameter describing the probability to target the most susceptible site, the model successfully reproduces the statistical signatures of seismicity. The energy distributions closely follow power-law probability density functions (PDFs) with a scaling exponent of around -1. 6, consistent with the expectations of the Gutenberg-Richter (GR) law, for a wide range of the targeted triggering probability values. Additionally, for targeted triggering probabilities within the range 0.004-0.007, we observe spatiotemporal distributions that show bimodal behavior, which is not observed previously for the original sandpile. For this critical range of values for the probability, model statistics show remarkable comparison with long-period empirical data from earthquakes from different seismogenic regions. The proposed model has key advantages, the foremost of which is the fact that it simultaneously captures the energy, space, and time statistics of earthquakes by just introducing a single parameter, while introducing minimal parameters in the simple rules of the sandpile. We believe that the critical targeting probability parameterizes the memory that is inherently present in earthquake-generating regions.

  19. Statistical Models of Fracture Relevant to Nuclear-Grade Graphite: Review and Recommendations

    NASA Technical Reports Server (NTRS)

    Nemeth, Noel N.; Bratton, Robert L.

    2011-01-01

    The nuclear-grade (low-impurity) graphite needed for the fuel element and moderator material for next-generation (Gen IV) reactors displays large scatter in strength and a nonlinear stress-strain response from damage accumulation. This response can be characterized as quasi-brittle. In this expanded review, relevant statistical failure models for various brittle and quasi-brittle material systems are discussed with regard to strength distribution, size effect, multiaxial strength, and damage accumulation. This includes descriptions of the Weibull, Batdorf, and Burchell models as well as models that describe the strength response of composite materials, which involves distributed damage. Results from lattice simulations are included for a physics-based description of material breakdown. Consideration is given to the predicted transition between brittle and quasi-brittle damage behavior versus the density of damage (level of disorder) within the material system. The literature indicates that weakest-link-based failure modeling approaches appear to be reasonably robust in that they can be applied to materials that display distributed damage, provided that the level of disorder in the material is not too large. The Weibull distribution is argued to be the most appropriate statistical distribution to model the stochastic-strength response of graphite.

  20. Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics

    NASA Astrophysics Data System (ADS)

    Lazarus, S. M.; Holman, B. P.; Splitt, M. E.

    2017-12-01

    A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.

  1. Accounting for patient variability in finite element analysis of the intact and implanted hip and knee: a review.

    PubMed

    Taylor, Mark; Bryan, Rebecca; Galloway, Francis

    2013-02-01

    It is becoming increasingly difficult to differentiate the performance of new joint replacement designs using available preclinical test methods. Finite element analysis is commonly used and the majority of published studies are performed on representative anatomy, assuming optimal implant placement, subjected to idealised loading conditions. There are significant differences between patients and accounting for this variability will lead to better assessment of the risk of failure. This review paper provides a comprehensive overview of the techniques available to account for patient variability. There is a brief overview of patient-specific model generation techniques, followed by a review of multisubject patient-specific studies performed on the intact and implanted femur and tibia. In particular, the challenges and limitations of manually generating models for such studies are discussed. To efficiently account for patient variability, the application of statistical shape and intensity models (SSIM) are being developed. Such models have the potential to synthetically generate thousands of representative models generated from a much smaller training set. Combined with the automation of the prosthesis implantation process, SSIM provides a potentially powerful tool for assessing the next generation of implant designs. The potential application of SSIM are discussed along with their limitations. Copyright © 2012 John Wiley & Sons, Ltd.

  2. Study of Hydrokinetic Turbine Arrays with Large Eddy Simulation

    NASA Astrophysics Data System (ADS)

    Sale, Danny; Aliseda, Alberto

    2014-11-01

    Marine renewable energy is advancing towards commercialization, including electrical power generation from ocean, river, and tidal currents. The focus of this work is to develop numerical simulations capable of predicting the power generation potential of hydrokinetic turbine arrays-this includes analysis of unsteady and averaged flow fields, turbulence statistics, and unsteady loadings on turbine rotors and support structures due to interaction with rotor wakes and ambient turbulence. The governing equations of large-eddy-simulation (LES) are solved using a finite-volume method, and the presence of turbine blades are approximated by the actuator-line method in which hydrodynamic forces are projected to the flow field as a body force. The actuator-line approach captures helical wake formation including vortex shedding from individual blades, and the effects of drag and vorticity generation from the rough seabed surface are accounted for by wall-models. This LES framework was used to replicate a previous flume experiment consisting of three hydrokinetic turbines tested under various operating conditions and array layouts. Predictions of the power generation, velocity deficit and turbulence statistics in the wakes are compared between the LES and experimental datasets.

  3. Estimating Preferential Flow in Karstic Aquifers Using Statistical Mixed Models

    PubMed Central

    Anaya, Angel A.; Padilla, Ingrid; Macchiavelli, Raul; Vesper, Dorothy J.; Meeker, John D.; Alshawabkeh, Akram N.

    2013-01-01

    Karst aquifers are highly productive groundwater systems often associated with conduit flow. These systems can be highly vulnerable to contamination, resulting in a high potential for contaminant exposure to humans and ecosystems. This work develops statistical models to spatially characterize flow and transport patterns in karstified limestone and determines the effect of aquifer flow rates on these patterns. A laboratory-scale Geo-HydroBed model is used to simulate flow and transport processes in a karstic limestone unit. The model consists of stainless-steel tanks containing a karstified limestone block collected from a karst aquifer formation in northern Puerto Rico. Experimental work involves making a series of flow and tracer injections, while monitoring hydraulic and tracer response spatially and temporally. Statistical mixed models are applied to hydraulic data to determine likely pathways of preferential flow in the limestone units. The models indicate a highly heterogeneous system with dominant, flow-dependent preferential flow regions. Results indicate that regions of preferential flow tend to expand at higher groundwater flow rates, suggesting a greater volume of the system being flushed by flowing water at higher rates. Spatial and temporal distribution of tracer concentrations indicates the presence of conduit-like and diffuse flow transport in the system, supporting the notion of both combined transport mechanisms in the limestone unit. The temporal response of tracer concentrations at different locations in the model coincide with, and confirms the preferential flow distribution generated with the statistical mixed models used in the study. PMID:23802921

  4. Uniting statistical and individual-based approaches for animal movement modelling.

    PubMed

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.

  5. Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

    PubMed Central

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047

  6. Generation of Multivariate Surface Weather Series with Use of the Stochastic Weather Generator Linked to Regional Climate Model

    NASA Astrophysics Data System (ADS)

    Dubrovsky, M.; Farda, A.; Huth, R.

    2012-12-01

    The regional-scale simulations of weather-sensitive processes (e.g. hydrology, agriculture and forestry) for the present and/or future climate often require high resolution meteorological inputs in terms of the time series of selected surface weather characteristics (typically temperature, precipitation, solar radiation, humidity, wind) for a set of stations or on a regular grid. As even the latest Global and Regional Climate Models (GCMs and RCMs) do not provide realistic representation of statistical structure of the surface weather, the model outputs must be postprocessed (downscaled) to achieve the desired statistical structure of the weather data before being used as an input to the follow-up simulation models. One of the downscaling approaches, which is employed also here, is based on a weather generator (WG), which is calibrated using the observed weather series and then modified (in case of simulations for the future climate) according to the GCM- or RCM-based climate change scenarios. The present contribution uses the parametric daily weather generator M&Rfi to follow two aims: (1) Validation of the new simulations of the present climate (1961-1990) made by the ALADIN-Climate/CZ (v.2) Regional Climate Model at 25 km resolution. The WG parameters will be derived from the RCM-simulated surface weather series and compared to those derived from observational data in the Czech meteorological stations. The set of WG parameters will include selected statistics of the surface temperature and precipitation (characteristics of the mean, variability, interdiurnal variability and extremes). (2) Testing a potential of RCM output for calibration of the WG for the ungauged locations. The methodology being examined will consist in using the WG, whose parameters are interpolated from the surrounding stations and then corrected based on a RCM-simulated spatial variability. The quality of the weather series produced by the WG calibrated in this way will be assessed in terms of selected climatic characteristics focusing on extreme precipitation and temperature characteristics (including characteristics of dry/wet/hot/cold spells). Acknowledgements: The present experiment is made within the frame of projects ALARO (project P209/11/2405 sponsored by the Czech Science Foundation), WG4VALUE (project LD12029 sponsored by the Ministry of Education, Youth and Sports) and VALUE (COST ES 1102 action).

  7. Semantic attributes based texture generation

    NASA Astrophysics Data System (ADS)

    Chi, Huifang; Gan, Yanhai; Qi, Lin; Dong, Junyu; Madessa, Amanuel Hirpa

    2018-04-01

    Semantic attributes are commonly used for texture description. They can be used to describe the information of a texture, such as patterns, textons, distributions, brightness, and so on. Generally speaking, semantic attributes are more concrete descriptors than perceptual features. Therefore, it is practical to generate texture images from semantic attributes. In this paper, we propose to generate high-quality texture images from semantic attributes. Over the last two decades, several works have been done on texture synthesis and generation. Most of them focusing on example-based texture synthesis and procedural texture generation. Semantic attributes based texture generation still deserves more devotion. Gan et al. proposed a useful joint model for perception driven texture generation. However, perceptual features are nonobjective spatial statistics used by humans to distinguish different textures in pre-attentive situations. To give more describing information about texture appearance, semantic attributes which are more in line with human description habits are desired. In this paper, we use sigmoid cross entropy loss in an auxiliary model to provide enough information for a generator. Consequently, the discriminator is released from the relatively intractable mission of figuring out the joint distribution of condition vectors and samples. To demonstrate the validity of our method, we compare our method to Gan et al.'s method on generating textures by designing experiments on PTD and DTD. All experimental results show that our model can generate textures from semantic attributes.

  8. A resampling procedure for generating conditioned daily weather sequences

    USGS Publications Warehouse

    Clark, Martyn P.; Gangopadhyay, Subhrendu; Brandon, David; Werner, Kevin; Hay, Lauren E.; Rajagopalan, Balaji; Yates, David

    2004-01-01

    A method is introduced to generate conditioned daily precipitation and temperature time series at multiple stations. The method resamples data from the historical record “nens” times for the period of interest (nens = number of ensemble members) and reorders the ensemble members to reconstruct the observed spatial (intersite) and temporal correlation statistics. The weather generator model is applied to 2307 stations in the contiguous United States and is shown to reproduce the observed spatial correlation between neighboring stations, the observed correlation between variables (e.g., between precipitation and temperature), and the observed temporal correlation between subsequent days in the generated weather sequence. The weather generator model is extended to produce sequences of weather that are conditioned on climate indices (in this case the Niño 3.4 index). Example illustrations of conditioned weather sequences are provided for a station in Arizona (Petrified Forest, 34.8°N, 109.9°W), where El Niño and La Niña conditions have a strong effect on winter precipitation. The conditioned weather sequences generated using the methods described in this paper are appropriate for use as input to hydrologic models to produce multiseason forecasts of streamflow.

  9. Approaches in highly parameterized inversion: TSPROC, a general time-series processor to assist in model calibration and result summarization

    USGS Publications Warehouse

    Westenbroek, Stephen M.; Doherty, John; Walker, John F.; Kelson, Victor A.; Hunt, Randall J.; Cera, Timothy B.

    2012-01-01

    The TSPROC (Time Series PROCessor) computer software uses a simple scripting language to process and analyze time series. It was developed primarily to assist in the calibration of environmental models. The software is designed to perform calculations on time-series data commonly associated with surface-water models, including calculation of flow volumes, transformation by means of basic arithmetic operations, and generation of seasonal and annual statistics and hydrologic indices. TSPROC can also be used to generate some of the key input files required to perform parameter optimization by means of the PEST (Parameter ESTimation) computer software. Through the use of TSPROC, the objective function for use in the model-calibration process can be focused on specific components of a hydrograph.

  10. Feedbacks Between Shallow Groundwater Dynamics and Surface Topography on Runoff Generation in Flat Fields

    NASA Astrophysics Data System (ADS)

    Appels, Willemijn M.; Bogaart, Patrick W.; van der Zee, Sjoerd E. A. T. M.

    2017-12-01

    In winter, saturation excess (SE) ponding is observed regularly in temperate lowland regions. Surface runoff dynamics are controlled by small topographical features that are unaccounted for in hydrological models. To better understand storage and routing effects of small-scale topography and their interaction with shallow groundwater under SE conditions, we developed a model of reduced complexity to investigate SE runoff generation, emphasizing feedbacks between shallow groundwater dynamics and mesotopography. The dynamic specific yield affected unsaturated zone water storage, causing rapid switches between negative and positive head and a flatter groundwater mound than predicted by analytical agrohydrological models. Accordingly, saturated areas were larger and local groundwater fluxes smaller than predicted, leading to surface runoff generation. Mesotopographic features routed water over larger distances, providing a feedback mechanism that amplified changes to the shape of the groundwater mound. This in turn enhanced runoff generation, but whether it also resulted in runoff events depended on the geometry and location of the depressions. Whereas conditions favorable to runoff generation may abound during winter, these feedbacks profoundly reduce the predictability of SE runoff: statistically identical rainfall series may result in completely different runoff generation. The model results indicate that waterlogged areas in any given rainfall event are larger than those predicted by current analytical groundwater models used for drainage design. This change in the groundwater mound extent has implications for crop growth and damage assessments.

  11. The effects of duration of exposure to the REAPS model in developing students' general creativity and creative problem solving in science

    NASA Astrophysics Data System (ADS)

    Alhusaini, Abdulnasser Alashaal F.

    The Real Engagement in Active Problem Solving (REAPS) model was developed in 2004 by C. June Maker and colleagues as an intervention for gifted students to develop creative problem solving ability through the use of real-world problems. The primary purpose of this study was to examine the effects of the REAPS model on developing students' general creativity and creative problem solving in science with two durations as independent variables. The long duration of the REAPS model implementation lasted five academic quarters or approximately 10 months; the short duration lasted two quarters or approximately four months. The dependent variables were students' general creativity and creative problem solving in science. The second purpose of the study was to explore which aspects of creative problem solving (i.e., generating ideas, generating different types of ideas, generating original ideas, adding details to ideas, generating ideas with social impact, finding problems, generating and elaborating on solutions, and classifying elements) were most affected by the long duration of the intervention. The REAPS model in conjunction with Amabile's (1983; 1996) model of creative performance provided the theoretical framework for this study. The study was conducted using data from the Project of Differentiation for Diverse Learners in Regular Classrooms (i.e., the Australian Project) in which one public elementary school in the eastern region of Australia cooperated with the DISCOVER research team at the University of Arizona. All students in the school from first to sixth grade participated in the study. The total sample was 360 students, of which 115 were exposed to a long duration and 245 to a short duration of the REAPS model. The principal investigators used a quasi-experimental research design in which all students in the school received the treatment for different durations. Students in both groups completed pre- and posttests using the Test of Creative Thinking-Drawing Production (TCT-DP) and the Test of Creative Problem Solving in Science (TCPS-S). A one-way analysis of covariance (ANCOVA) was conducted to control for differences between the two groups on pretest results. Statistically significant differences were not found between posttest scores on the TCT-DP for the two durations of REAPS model implementation. However, statistically significant differences were found between posttest scores on the TCPS-S. These findings are consistent with Amabile's (1983; 1996) model of creative performance, particularly her explanation that domain-specific creativity requires knowledge such as specific content and technical skills that must be learned prior to being applied creatively. The findings are also consistent with literature in which researchers have found that longer interventions typically result in expected positive growth in domain-specific creativity, while both longer and shorter interventions have been found effective in improving domain-general creativity. Change scores were also calculated between pre- and posttest scores on the 8 aspects of creativity (Maker, Jo, Alfaiz, & Alhusaini, 2015a), and a binary logistic regression was conducted to assess which were the most affected by the long duration of the intervention. The regression model was statistically significant, with aspects of generating ideas, adding details to ideas, and finding problems being the most affected by the long duration of the intervention. Based on these findings, the researcher believes that the REAPS model is a useful intervention to develop students' creativity. Future researchers should implement the model for longer durations if they are interested in developing students' domain-specific creative problem solving ability.

  12. A hydrologic network supporting spatially referenced regression modeling in the Chesapeake Bay watershed

    USGS Publications Warehouse

    Brakebill, J.W.; Preston, S.D.

    2003-01-01

    The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived from the DEMs. This network improves upon existing digital stream data by increasing the level of spatial detail and providing consistency between the reach locations and topography. The hydrologic network also aids in illustrating the spatial patterns of predicted nutrient loads and sources contributed locally to each stream, and the percentages of nutrient load that reach Chesapeake Bay.

  13. Topological model of composite fermions in the cyclotron band generator picture: New insights

    NASA Astrophysics Data System (ADS)

    Staśkiewicz, Beata

    2018-03-01

    A combinatorial group theory in the braid groups is correlated with the unusual "anyon" statistic of particles in 2D Hall system in the fractional quantum regime well. On this background has been derived cyclotron band generator as a modification and generalization band generator, first established to solve the word and conjugacy problems in the braid group terms. Topological commensurability condition has been embraced by canonical factors - like, based on the concept of parallel descending cycles. Owing to this we can mathematically capture the general hierarchy of correlated states in the lowest Landau level, describing the fractional quantum Hall effect hierarchy, in terms of cyclotron band generators, especially for those being beyond conventional composite fermions model. It has been also shown that cyclotron braid subgroups, developed for interpretation of Laughlin correlations, are a special case of the right-angled Artin groups.

  14. Markov switching multinomial logit model: An application to accident-injury severities.

    PubMed

    Malyshkina, Nataliya V; Mannering, Fred L

    2009-07-01

    In this study, two-state Markov switching multinomial logit models are proposed for statistical modeling of accident-injury severities. These models assume Markov switching over time between two unobserved states of roadway safety as a means of accounting for potential unobserved heterogeneity. The states are distinct in the sense that in different states accident-severity outcomes are generated by separate multinomial logit processes. To demonstrate the applicability of the approach, two-state Markov switching multinomial logit models are estimated for severity outcomes of accidents occurring on Indiana roads over a four-year time period. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) simulations are used for model estimation. The estimated Markov switching models result in a superior statistical fit relative to the standard (single-state) multinomial logit models for a number of roadway classes and accident types. It is found that the more frequent state of roadway safety is correlated with better weather conditions and that the less frequent state is correlated with adverse weather conditions.

  15. Efficient Generation and Selection of Virtual Populations in Quantitative Systems Pharmacology Models

    PubMed Central

    Rieger, TR; Musante, CJ

    2016-01-01

    Quantitative systems pharmacology models mechanistically describe a biological system and the effect of drug treatment on system behavior. Because these models rarely are identifiable from the available data, the uncertainty in physiological parameters may be sampled to create alternative parameterizations of the model, sometimes termed “virtual patients.” In order to reproduce the statistics of a clinical population, virtual patients are often weighted to form a virtual population that reflects the baseline characteristics of the clinical cohort. Here we introduce a novel technique to efficiently generate virtual patients and, from this ensemble, demonstrate how to select a virtual population that matches the observed data without the need for weighting. This approach improves confidence in model predictions by mitigating the risk that spurious virtual patients become overrepresented in virtual populations. PMID:27069777

  16. Brain-computer interface with language model-electroencephalography fusion for locked-in syndrome.

    PubMed

    Oken, Barry S; Orhan, Umut; Roark, Brian; Erdogmus, Deniz; Fowler, Andrew; Mooney, Aimee; Peters, Betts; Miller, Meghan; Fried-Oken, Melanie B

    2014-05-01

    Some noninvasive brain-computer interface (BCI) systems are currently available for locked-in syndrome (LIS) but none have incorporated a statistical language model during text generation. To begin to address the communication needs of individuals with LIS using a noninvasive BCI that involves rapid serial visual presentation (RSVP) of symbols and a unique classifier with electroencephalography (EEG) and language model fusion. The RSVP Keyboard was developed with several unique features. Individual letters are presented at 2.5 per second. Computer classification of letters as targets or nontargets based on EEG is performed using machine learning that incorporates a language model for letter prediction via Bayesian fusion enabling targets to be presented only 1 to 4 times. Nine participants with LIS and 9 healthy controls were enrolled. After screening, subjects first calibrated the system, and then completed a series of balanced word generation mastery tasks that were designed with 5 incremental levels of difficulty, which increased by selecting phrases for which the utility of the language model decreased naturally. Six participants with LIS and 9 controls completed the experiment. All LIS participants successfully mastered spelling at level 1 and one subject achieved level 5. Six of 9 control participants achieved level 5. Individuals who have incomplete LIS may benefit from an EEG-based BCI system, which relies on EEG classification and a statistical language model. Steps to further improve the system are discussed.

  17. Kaman 40 kW wind turbine generator - control system dynamics

    NASA Technical Reports Server (NTRS)

    Perley, R.

    1981-01-01

    The generator design incorporates an induction generator for application where a utility line is present and a synchronous generator for standalone applications. A combination of feed forward and feedback control is used to achieve synchronous speed prior to connecting the generator to the load, and to control the power level once the generator is connected. The dynamics of the drive train affect several aspects of the system operation. These were analyzed to arrive at the required shaft stiffness. The rotor parameters that affect the stability of the feedback control loop vary considerably over the wind speed range encountered. Therefore, the controller gain was made a function of wind speed in order to maintain consistent operation over the whole wind speed range. The velocity requirement for the pitch control mechanism is related to the nature of the wind gusts to be encountered, the dynamics of the system, and the acceptable power fluctuations and generator dropout rate. A model was developed that allows the probable dropout rate to be determined from a statistical model of wind gusts and the various system parameters, including the acceptable power fluctuation.

  18. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  19. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  20. Performance and Architecture Lab Modeling Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-06-19

    Analytical application performance models are critical for diagnosing performance-limiting resources, optimizing systems, and designing machines. Creating models, however, is difficult. Furthermore, models are frequently expressed in forms that are hard to distribute and validate. The Performance and Architecture Lab Modeling tool, or Palm, is a modeling tool designed to make application modeling easier. Palm provides a source code modeling annotation language. Not only does the modeling language divide the modeling task into sub problems, it formally links an application's source code with its model. This link is important because a model's purpose is to capture application behavior. Furthermore, this linkmore » makes it possible to define rules for generating models according to source code organization. Palm generates hierarchical models according to well-defined rules. Given an application, a set of annotations, and a representative execution environment, Palm will generate the same model. A generated model is a an executable program whose constituent parts directly correspond to the modeled application. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. A model's hierarchy is defined by static and dynamic source code structure. Because Palm coordinates models and source code, Palm's models are 'first-class' and reproducible. Palm automates common modeling tasks. For instance, Palm incorporates measurements to focus attention, represent constant behavior, and validate models. Palm's workflow is as follows. The workflow's input is source code annotated with Palm modeling annotations. The most important annotation models an instance of a block of code. Given annotated source code, the Palm Compiler produces executables and the Palm Monitor collects a representative performance profile. The Palm Generator synthesizes a model based on the static and dynamic mapping of annotations to program behavior. The model -- an executable program -- is a hierarchical composition of annotation functions, synthesized functions, statistics for runtime values, and performance measurements.« less

  1. A multimodal wave spectrum-based approach for statistical downscaling of local wave climate

    USGS Publications Warehouse

    Hegermiller, Christie; Antolinez, Jose A A; Rueda, Ana C.; Camus, Paula; Perez, Jorge; Erikson, Li; Barnard, Patrick; Mendez, Fernando J.

    2017-01-01

    Characterization of wave climate by bulk wave parameters is insufficient for many coastal studies, including those focused on assessing coastal hazards and long-term wave climate influences on coastal evolution. This issue is particularly relevant for studies using statistical downscaling of atmospheric fields to local wave conditions, which are often multimodal in large ocean basins (e.g. the Pacific). Swell may be generated in vastly different wave generation regions, yielding complex wave spectra that are inadequately represented by a single set of bulk wave parameters. Furthermore, the relationship between atmospheric systems and local wave conditions is complicated by variations in arrival time of wave groups from different parts of the basin. Here, we address these two challenges by improving upon the spatiotemporal definition of the atmospheric predictor used in statistical downscaling of local wave climate. The improved methodology separates the local wave spectrum into “wave families,” defined by spectral peaks and discrete generation regions, and relates atmospheric conditions in distant regions of the ocean basin to local wave conditions by incorporating travel times computed from effective energy flux across the ocean basin. When applied to locations with multimodal wave spectra, including Southern California and Trujillo, Peru, the new methodology improves the ability of the statistical model to project significant wave height, peak period, and direction for each wave family, retaining more information from the full wave spectrum. This work is the base of statistical downscaling by weather types, which has recently been applied to coastal flooding and morphodynamic applications.

  2. Anisotropic Bispectrum of Curvature Perturbations from Primordial Non-Abelian Vector Fields

    NASA Astrophysics Data System (ADS)

    Bartolo, Nicola; Dimastrogiovanni, Emanuela; Matarrese, Sabino; Riotto, Antonio

    2009-10-01

    We consider a primordial SU(2) vector multiplet during inflation in models where quantum fluctuations of vector fields are involved in producing the curvature perturbation. Recently, a lot of attention has been paid to models populated by vector fields, given the interesting possibility of generating some level of statistical anisotropy in the cosmological perturbations. The scenario we propose is strongly motivated by the fact that, for non-Abelian gauge fields, self-interactions are responsible for generating extra terms in the cosmological correlation functions, which are naturally absent in the Abelian case. We compute these extra contributions to the bispectrum of the curvature perturbation, using the δN formula and the Schwinger-Keldysh formalism. The primordial violation of rotational invariance (due to the introduction of the SU(2) gauge multiplet) leaves its imprint on the correlation functions introducing, as expected, some degree of statistical anisotropy in our results. We calculate the non-Gaussianity parameter fNL, proving that the new contributions derived from gauge bosons self-interactions can be important, and in some cases the dominat ones. We study the shape of the bispectrum and we find that it turns out to peak in the local configuration, with an amplitude that is modulated by the preferred directions that break statistical isotropy.

  3. Statistics of cosmic density profiles from perturbation theory

    NASA Astrophysics Data System (ADS)

    Bernardeau, Francis; Pichon, Christophe; Codis, Sandrine

    2014-11-01

    The joint probability distribution function (PDF) of the density within multiple concentric spherical cells is considered. It is shown how its cumulant generating function can be obtained at tree order in perturbation theory as the Legendre transform of a function directly built in terms of the initial moments. In the context of the upcoming generation of large-scale structure surveys, it is conjectured that this result correctly models such a function for finite values of the variance. Detailed consequences of this assumption are explored. In particular the corresponding one-cell density probability distribution at finite variance is computed for realistic power spectra, taking into account its scale variation. It is found to be in agreement with Λ -cold dark matter simulations at the few percent level for a wide range of density values and parameters. Related explicit analytic expansions at the low and high density tails are given. The conditional (at fixed density) and marginal probability of the slope—the density difference between adjacent cells—and its fluctuations is also computed from the two-cell joint PDF; it also compares very well to simulations. It is emphasized that this could prove useful when studying the statistical properties of voids as it can serve as a statistical indicator to test gravity models and/or probe key cosmological parameters.

  4. Multi-level methods and approximating distribution functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, D., E-mail: daniel.wilson@dtc.ox.ac.uk; Baker, R. E.

    2016-07-15

    Biochemical reaction networks are often modelled using discrete-state, continuous-time Markov chains. System statistics of these Markov chains usually cannot be calculated analytically and therefore estimates must be generated via simulation techniques. There is a well documented class of simulation techniques known as exact stochastic simulation algorithms, an example of which is Gillespie’s direct method. These algorithms often come with high computational costs, therefore approximate stochastic simulation algorithms such as the tau-leap method are used. However, in order to minimise the bias in the estimates generated using them, a relatively small value of tau is needed, rendering the computational costs comparablemore » to Gillespie’s direct method. The multi-level Monte Carlo method (Anderson and Higham, Multiscale Model. Simul. 10:146–179, 2012) provides a reduction in computational costs whilst minimising or even eliminating the bias in the estimates of system statistics. This is achieved by first crudely approximating required statistics with many sample paths of low accuracy. Then correction terms are added until a required level of accuracy is reached. Recent literature has primarily focussed on implementing the multi-level method efficiently to estimate a single system statistic. However, it is clearly also of interest to be able to approximate entire probability distributions of species counts. We present two novel methods that combine known techniques for distribution reconstruction with the multi-level method. We demonstrate the potential of our methods using a number of examples.« less

  5. Statistical, economic and other tools for assessing natural aggregate

    USGS Publications Warehouse

    Bliss, J.D.; Moyle, P.R.; Bolm, K.S.

    2003-01-01

    Quantitative aggregate resource assessment provides resource estimates useful for explorationists, land managers and those who make decisions about land allocation, which may have long-term implications concerning cost and the availability of aggregate resources. Aggregate assessment needs to be systematic and consistent, yet flexible enough to allow updating without invalidating other parts of the assessment. Evaluators need to use standard or consistent aggregate classification and statistic distributions or, in other words, models with geological, geotechnical and economic variables or interrelationships between these variables. These models can be used with subjective estimates, if needed, to estimate how much aggregate may be present in a region or country using distributions generated by Monte Carlo computer simulations.

  6. Evolution of Precipitation Particle Size Distributions within MC3E Systems and its Impact on Aerosol-Cloud-Precipitation Interactions: Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kollias, Pavlos

    2017-08-08

    This is a multi-institutional, collaborative project using observations and modeling to study the evolution (e.g. formation and growth) of hydrometeors in continental convective clouds. Our contribution was in data analysis for the generation of high-value cloud and precipitation products and derive cloud statistics for model validation. There are two areas in data analysis that we contributed: i) the development of novel, state-of-the-art dual-wavelength radar algorithms for the retrieval of cloud microphysical properties and ii) the evaluation of large domain, high-resolution models using comprehensive multi-sensor observations. Our research group developed statistical summaries from numerous sensors and developed retrievals of vertical airmore » motion in deep convection.« less

  7. Sensitivity of Regional Hydropower Generation to the Projected Changes in Future Watershed Hydrology

    NASA Astrophysics Data System (ADS)

    Kao, S. C.; Naz, B. S.; Gangrade, S.

    2015-12-01

    Hydropower is a key contributor to the renewable energy portfolio due to its established development history and the diverse benefits it provides to the electric power systems. With the projected change in the future watershed hydrology, including shift of snowmelt timing, increasing occurrence of extreme precipitation, and change in drought frequencies, there is a need to investigate how the regional hydropower generation may change correspondingly. To evaluate the sensitivity of watershed storage and hydropower generation to future climate change, a lumped Watershed Runoff-Energy Storage (WRES) model is developed to simulate the annual and seasonal hydropower generation at various hydropower areas in the United States. For each hydropower study area, the WRES model use the monthly precipitation and naturalized (unregulated) runoff as inputs to perform a runoff mass balance calculation for the total monthly runoff storage in all reservoirs and retention facilities in the watershed, and simulate the monthly regulated runoff release and hydropower generation through the system. The WRES model is developed and calibrated using the historic (1980-2009) monthly precipitation, runoff, and generation data, and then driven by a large set of dynamically- and statistically-downscaled Coupled Model Intercomparison Project Phase 5 climate projections to simulate the change of watershed storage and hydropower generation under different future climate scenarios. The results among different hydropower regions, storage capacities, emission scenarios, and timescales are compared and discussed in this study.

  8. Learning a generative probabilistic grammar of experience: a process-level model of language acquisition.

    PubMed

    Kolodny, Oren; Lotem, Arnon; Edelman, Shimon

    2015-03-01

    We introduce a set of biologically and computationally motivated design choices for modeling the learning of language, or of other types of sequential, hierarchically structured experience and behavior, and describe an implemented system that conforms to these choices and is capable of unsupervised learning from raw natural-language corpora. Given a stream of linguistic input, our model incrementally learns a grammar that captures its statistical patterns, which can then be used to parse or generate new data. The grammar constructed in this manner takes the form of a directed weighted graph, whose nodes are recursively (hierarchically) defined patterns over the elements of the input stream. We evaluated the model in seventeen experiments, grouped into five studies, which examined, respectively, (a) the generative ability of grammar learned from a corpus of natural language, (b) the characteristics of the learned representation, (c) sequence segmentation and chunking, (d) artificial grammar learning, and (e) certain types of structure dependence. The model's performance largely vindicates our design choices, suggesting that progress in modeling language acquisition can be made on a broad front-ranging from issues of generativity to the replication of human experimental findings-by bringing biological and computational considerations, as well as lessons from prior efforts, to bear on the modeling approach. Copyright © 2014 Cognitive Science Society, Inc.

  9. Assimilating the Future for Better Forecasts and Earlier Warnings

    NASA Astrophysics Data System (ADS)

    Du, H.; Wheatcroft, E.; Smith, L. A.

    2016-12-01

    Multi-model ensembles have become popular tools to account for some of the uncertainty due to model inadequacy in weather and climate simulation-based predictions. The current multi-model forecasts focus on combining single model ensemble forecasts by means of statistical post-processing. Assuming each model is developed independently or with different primary target variables, each is likely to contain different dynamical strengths and weaknesses. Using statistical post-processing, such information is only carried by the simulations under a single model ensemble: no advantage is taken to influence simulations under the other models. A novel methodology, named Multi-model Cross Pollination in Time, is proposed for multi-model ensemble scheme with the aim of integrating the dynamical information regarding the future from each individual model operationally. The proposed approach generates model states in time via applying data assimilation scheme(s) to yield truly "multi-model trajectories". It is demonstrated to outperform traditional statistical post-processing in the 40-dimensional Lorenz96 flow. Data assimilation approaches are originally designed to improve state estimation from the past to the current time. The aim of this talk is to introduce a framework that uses data assimilation to improve model forecasts at future time (not to argue for any one particular data assimilation scheme). Illustration of applying data assimilation "in the future" to provide early warning of future high-impact events is also presented.

  10. ANEMOS: Development of a next generation wind power forecasting system for the large-scale integration of onshore and offshore wind farms.

    NASA Astrophysics Data System (ADS)

    Kariniotakis, G.; Anemos Team

    2003-04-01

    Objectives: Accurate forecasting of the wind energy production up to two days ahead is recognized as a major contribution for reliable large-scale wind power integration. Especially, in a liberalized electricity market, prediction tools enhance the position of wind energy compared to other forms of dispatchable generation. ANEMOS, is a new 3.5 years R&D project supported by the European Commission, that resembles research organizations and end-users with an important experience on the domain. The project aims to develop advanced forecasting models that will substantially outperform current methods. Emphasis is given to situations like complex terrain, extreme weather conditions, as well as to offshore prediction for which no specific tools currently exist. The prediction models will be implemented in a software platform and installed for online operation at onshore and offshore wind farms by the end-users participating in the project. Approach: The paper presents the methodology of the project. Initially, the prediction requirements are identified according to the profiles of the end-users. The project develops prediction models based on both a physical and an alternative statistical approach. Research on physical models gives emphasis to techniques for use in complex terrain and the development of prediction tools based on CFD techniques, advanced model output statistics or high-resolution meteorological information. Statistical models (i.e. based on artificial intelligence) are developed for downscaling, power curve representation, upscaling for prediction at regional or national level, etc. A benchmarking process is set-up to evaluate the performance of the developed models and to compare them with existing ones using a number of case studies. The synergy between statistical and physical approaches is examined to identify promising areas for further improvement of forecasting accuracy. Appropriate physical and statistical prediction models are also developed for offshore wind farms taking into account advances in marine meteorology (interaction between wind and waves, coastal effects). The benefits from the use of satellite radar images for modeling local weather patterns are investigated. A next generation forecasting software, ANEMOS, will be developed to integrate the various models. The tool is enhanced by advanced Information Communication Technology (ICT) functionality and can operate both in stand alone, or remote mode, or be interfaced with standard Energy or Distribution Management Systems (EMS/DMS) systems. Contribution: The project provides an advanced technology for wind resource forecasting applicable in a large scale: at a single wind farm, regional or national level and for both interconnected and island systems. A major milestone is the on-line operation of the developed software by the participating utilities for onshore and offshore wind farms and the demonstration of the economic benefits. The outcome of the ANEMOS project will help consistently the increase of wind integration in two levels; in an operational level due to better management of wind farms, but also, it will contribute to increasing the installed capacity of wind farms. This is because accurate prediction of the resource reduces the risk of wind farm developers, who are then more willing to undertake new wind farm installations especially in a liberalized electricity market environment.

  11. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.

  12. Generalized Background Error covariance matrix model (GEN_BE v2.0)

    NASA Astrophysics Data System (ADS)

    Descombes, G.; Auligné, T.; Vandenberghe, F.; Barker, D. M.

    2014-07-01

    The specification of state background error statistics is a key component of data assimilation since it affects the impact observations will have on the analysis. In the variational data assimilation approach, applied in geophysical sciences, the dimensions of the background error covariance matrix (B) are usually too large to be explicitly determined and B needs to be modeled. Recent efforts to include new variables in the analysis such as cloud parameters and chemical species have required the development of the code to GENerate the Background Errors (GEN_BE) version 2.0 for the Weather Research and Forecasting (WRF) community model to allow for a simpler, flexible, robust, and community-oriented framework that gathers methods used by meteorological operational centers and researchers. We present the advantages of this new design for the data assimilation community by performing benchmarks and showing some of the new features on data assimilation test cases. As data assimilation for clouds remains a challenge, we present a multivariate approach that includes hydrometeors in the control variables and new correlated errors. In addition, the GEN_BE v2.0 code is employed to diagnose error parameter statistics for chemical species, which shows that it is a tool flexible enough to involve new control variables. While the generation of the background errors statistics code has been first developed for atmospheric research, the new version (GEN_BE v2.0) can be easily extended to other domains of science and be chosen as a testbed for diagnostic and new modeling of B. Initially developed for variational data assimilation, the model of the B matrix may be useful for variational ensemble hybrid methods as well.

  13. Next-generation prognostic assessment for diffuse large B-cell lymphoma

    PubMed Central

    Staton, Ashley D; Kof, Jean L; Chen, Qiushi; Ayer, Turgay; Flowers, Christopher R

    2015-01-01

    Current standard of care therapy for diffuse large B-cell lymphoma (DLBCL) cures a majority of patients with additional benefit in salvage therapy and autologous stem cell transplant for patients who relapse. The next generation of prognostic models for DLBCL aims to more accurately stratify patients for novel therapies and risk-adapted treatment strategies. This review discusses the significance of host genetic and tumor genomic alterations seen in DLBCL, clinical and epidemiologic factors, and how each can be integrated into risk stratification algorithms. In the future, treatment prediction and prognostic model development and subsequent validation will require data from a large number of DLBCL patients to establish sufficient statistical power to correctly predict outcome. Novel modeling approaches can augment these efforts. PMID:26289217

  14. Next-generation prognostic assessment for diffuse large B-cell lymphoma.

    PubMed

    Staton, Ashley D; Koff, Jean L; Chen, Qiushi; Ayer, Turgay; Flowers, Christopher R

    2015-01-01

    Current standard of care therapy for diffuse large B-cell lymphoma (DLBCL) cures a majority of patients with additional benefit in salvage therapy and autologous stem cell transplant for patients who relapse. The next generation of prognostic models for DLBCL aims to more accurately stratify patients for novel therapies and risk-adapted treatment strategies. This review discusses the significance of host genetic and tumor genomic alterations seen in DLBCL, clinical and epidemiologic factors, and how each can be integrated into risk stratification algorithms. In the future, treatment prediction and prognostic model development and subsequent validation will require data from a large number of DLBCL patients to establish sufficient statistical power to correctly predict outcome. Novel modeling approaches can augment these efforts.

  15. Rényi Entropies from Random Quenches in Atomic Hubbard and Spin Models.

    PubMed

    Elben, A; Vermersch, B; Dalmonte, M; Cirac, J I; Zoller, P

    2018-02-02

    We present a scheme for measuring Rényi entropies in generic atomic Hubbard and spin models using single copies of a quantum state and for partitions in arbitrary spatial dimensions. Our approach is based on the generation of random unitaries from random quenches, implemented using engineered time-dependent disorder potentials, and standard projective measurements, as realized by quantum gas microscopes. By analyzing the properties of the generated unitaries and the role of statistical errors, with respect to the size of the partition, we show that the protocol can be realized in existing quantum simulators and used to measure, for instance, area law scaling of entanglement in two-dimensional spin models or the entanglement growth in many-body localized systems.

  16. Rényi Entropies from Random Quenches in Atomic Hubbard and Spin Models

    NASA Astrophysics Data System (ADS)

    Elben, A.; Vermersch, B.; Dalmonte, M.; Cirac, J. I.; Zoller, P.

    2018-02-01

    We present a scheme for measuring Rényi entropies in generic atomic Hubbard and spin models using single copies of a quantum state and for partitions in arbitrary spatial dimensions. Our approach is based on the generation of random unitaries from random quenches, implemented using engineered time-dependent disorder potentials, and standard projective measurements, as realized by quantum gas microscopes. By analyzing the properties of the generated unitaries and the role of statistical errors, with respect to the size of the partition, we show that the protocol can be realized in existing quantum simulators and used to measure, for instance, area law scaling of entanglement in two-dimensional spin models or the entanglement growth in many-body localized systems.

  17. Forecasting volatility with neural regression: a contribution to model adequacy.

    PubMed

    Refenes, A N; Holt, W T

    2001-01-01

    Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.

  18. Relationship between accident severity and full-scale crash test. Volume I, Technical research effort

    DOT National Transportation Integrated Search

    1984-08-01

    Available accident files are used to generate a 4l2-accident data base of guardrail impacts. This base is analyzed to develop a statistical model for predicting accident severity index (ASI) as a function of vehicle type or weight, impact speed, and ...

  19. Monte Carlo Approach for Reliability Estimations in Generalizability Studies.

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    A Monte Carlo approach is proposed, using the Statistical Analysis System (SAS) programming language, for estimating reliability coefficients in generalizability theory studies. Test scores are generated by a probabilistic model that considers the probability for a person with a given ability score to answer an item with a given difficulty…

  20. Application of the Auto-Tuned Land Assimilation System (ATLAS) to ASCAT and SMOS soil moisture retrieval products

    USDA-ARS?s Scientific Manuscript database

    Land data assimilations are typically based on highly uncertain assumptions regarding the statistical structure of observation and modeling errors. Left uncorrected, poor assumptions can degrade the quality of analysis products generated by land data assimilation systems. Recently, Crow and van de...

  1. A hybrid orographic plus statistical model for downscaling daily precipitation in Northern California

    USGS Publications Warehouse

    Pandey, G.R.; Cayan, D.R.; Dettinger, M.D.; Georgakakos, K.P.

    2000-01-01

    A hybrid (physical-statistical) scheme is developed to resolve the finescale distribution of daily precipitation over complex terrain. The scheme generates precipitation by combining information from the upper-air conditions and from sparsely distributed station measurements; thus, it proceeds in two steps. First, an initial estimate of the precipitation is made using a simplified orographic precipitation model. It is a steady-state, multilayer, and two-dimensional model following the concepts of Rhea. The model is driven by the 2.5?? ?? 2.5?? gridded National Oceanic and Atmospheric Administration-National Centers for Environmental Prediction upper-air profiles, and its parameters are tuned using the observed precipitation structure of the region. Precipitation is generated assuming a forced lifting of the air parcels as they cross the mountain barrier following a straight trajectory. Second, the precipitation is adjusted using errors between derived precipitation and observations from nearby sites. The study area covers the northern half of California, including coastal mountains, central valley, and the Sierra Nevada. The model is run for a 5-km rendition of terrain for days of January-March over the period of 1988-95. A jackknife analysis demonstrates the validity of the approach. The spatial and temporal distributions of the simulated precipitation field agree well with the observed precipitation. Further, a mapping of model performance indices (correlation coefficients, model bias, root-mean-square error, and threat scores) from an array of stations from the region indicates that the model performs satisfactorily in resolving daily precipitation at 5-km resolution.

  2. Multi-level emulation of complex climate model responses to boundary forcing data

    NASA Astrophysics Data System (ADS)

    Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

    2018-04-01

    Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.

  3. The application of the statistical classifying models for signal evaluation of the gas sensors analyzing mold contamination of the building materials

    NASA Astrophysics Data System (ADS)

    Majerek, Dariusz; Guz, Łukasz; Suchorab, Zbigniew; Łagód, Grzegorz; Sobczuk, Henryk

    2017-07-01

    Mold that develops on moistened building barriers is a major cause of the Sick Building Syndrome (SBS). Fungal contamination is normally evaluated using standard biological methods which are time-consuming and require a lot of manual labor. Fungi emit Volatile Organic Compounds (VOC) that can be detected in the indoor air using several techniques of detection e.g. chromatography. VOCs can be also detected using gas sensors arrays. All array sensors generate particular voltage signals that ought to be analyzed using properly selected statistical methods of interpretation. This work is focused on the attempt to apply statistical classifying models in evaluation of signals from gas sensors matrix to analyze the air sampled from the headspace of various types of the building materials at different level of contamination but also clean reference materials.

  4. A Vignette (User's Guide) for “An R Package for Statistical ...

    EPA Pesticide Factsheets

    StatCharrms is a graphical user front-end for ease of use in analyzing data generated from OCSPP 890.2200, Medaka Extended One Generation Reproduction Test (MEOGRT) and OCSPP 890.2300, Larval Amphibian Gonad Development Assay (LAGDA). The analyses StatCharrms is capable of performing are: Rao-Scott adjusted Cochran-Armitage test for trend By Slices (RSCABS), a Standard Cochran-Armitage test for trend By Slices (SCABS), mixed effects Cox proportional model, Jonckheere-Terpstra step down trend test, Dunn test, one way ANOVA, weighted ANOVA, mixed effects ANOVA, repeated measures ANOVA, and Dunnett test. This document provides a User’s Manual (termed a Vignette by the Comprehensive R Archive Network (CRAN)) for the previously created R-code tool called StatCharrms (Statistical analysis of Chemistry, Histopathology, and Reproduction endpoints using Repeated measures and Multi-generation Studies). The StatCharrms R-code has been publically available directly from EPA staff since the approval of OCSPP 890.2200 and 890.2300, and now is available publically available at the CRAN.

  5. Calibrating genomic and allelic coverage bias in single-cell sequencing.

    PubMed

    Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

    2015-04-16

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

  6. Calibrating genomic and allelic coverage bias in single-cell sequencing

    PubMed Central

    Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

    2016-01-01

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913

  7. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  8. Indexed semi-Markov process for wind speed modeling.

    NASA Astrophysics Data System (ADS)

    Petroni, F.; D'Amico, G.; Prattico, F.

    2012-04-01

    The increasing interest in renewable energy leads scientific research to find a better way to recover most of the available energy. Particularly, the maximum energy recoverable from wind is equal to 59.3% of that available (Betz law) at a specific pitch angle and when the ratio between the wind speed in output and in input is equal to 1/3. The pitch angle is the angle formed between the airfoil of the blade of the wind turbine and the wind direction. Old turbine and a lot of that actually marketed, in fact, have always the same invariant geometry of the airfoil. This causes that wind turbines will work with an efficiency that is lower than 59.3%. New generation wind turbines, instead, have a system to variate the pitch angle by rotating the blades. This system able the wind turbines to recover, at different wind speed, always the maximum energy, working in Betz limit at different speed ratios. A powerful system control of the pitch angle allows the wind turbine to recover better the energy in transient regime. A good stochastic model for wind speed is then needed to help both the optimization of turbine design and to assist the system control to predict the value of the wind speed to positioning the blades quickly and correctly. The possibility to have synthetic data of wind speed is a powerful instrument to assist designer to verify the structures of the wind turbines or to estimate the energy recoverable from a specific site. To generate synthetic data, Markov chains of first or higher order are often used [1,2,3]. In particular in [1] is presented a comparison between a first-order Markov chain and a second-order Markov chain. A similar work, but only for the first-order Markov chain, is conduced by [2], presenting the probability transition matrix and comparing the energy spectral density and autocorrelation of real and synthetic wind speed data. A tentative to modeling and to join speed and direction of wind is presented in [3], by using two models, first-order Markov chain with different number of states, and Weibull distribution. All this model use Markov chains to generate synthetic wind speed time series but the search for a better model is still open. Approaching this issue, we applied new models which are generalization of Markov models. More precisely we applied semi-Markov models to generate synthetic wind speed time series. In a previous work we proposed different semi-Markov models, showing their ability to reproduce the autocorrelation structures of wind speed data. In that paper we showed also that the autocorrelation is higher with respect to the Markov model. Unfortunately this autocorrelation was still too small compared to the empirical one. In order to overcome the problem of low autocorrelation, in this paper we propose an indexed semi-Markov model. More precisely we assume that wind speed is described by a discrete time homogeneous semi-Markov process. We introduce a memory index which takes into account the periods of different wind activities. With this model the statistical characteristics of wind speed are faithfully reproduced. The wind is a very unstable phenomenon characterized by a sequence of lulls and sustained speeds, and a good wind generator must be able to reproduce such sequences. To check the validity of the predictive semi-Markovian model, the persistence of synthetic winds were calculated, then averaged and computed. The model is used to generate synthetic time series for wind speed by means of Monte Carlo simulations and the time lagged autocorrelation is used to compare statistical properties of the proposed models with those of real data and also with a time series generated though a simple Markov chain. [1] A. Shamshad, M.A. Bawadi, W.M.W. Wan Hussin, T.A. Majid, S.A.M. Sanusi, First and second order Markov chain models for synthetic generation of wind speed time series, Energy 30 (2005) 693-708. [2] H. Nfaoui, H. Essiarab, A.A.M. Sayigh, A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco, Renewable Energy 29 (2004) 1407-1418. [3] F. Youcef Ettoumi, H. Sauvageot, A.-E.-H. Adane, Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribution, Renewable Energy 28 (2003) 1787-1802.

  9. Statistically qualified neuro-analytic failure detection method and system

    DOEpatents

    Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

    2002-03-02

    An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.

  10. Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective

    USGS Publications Warehouse

    Barker, Richard J.; Link, William A.

    2015-01-01

    Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.

  11. Pilot points method for conditioning multiple-point statistical facies simulation on flow data

    NASA Astrophysics Data System (ADS)

    Ma, Wei; Jafarpour, Behnam

    2018-05-01

    We propose a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and then are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) is adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at selected locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  12. Extended-Range Prediction with Low-Dimensional, Stochastic-Dynamic Models: A Data-driven Approach

    DTIC Science & Technology

    2013-09-30

    statistically extratropical storms and extremes, and link these to LFV modes. Mingfang Ting, Yochanan Kushnir, Andrew W. Robertson, Lei Wang...forecast models, as well as in the understanding they have generated. Adam Sobel, Daehyun Kim and Shuguang Wang. Extratropical variability and...predictability. Determine the extent to which extratropical monthly and seasonal low-frequency variability (LFV, i.e. PNA, NAO, as well as other regional

  13. Implementation and Research on the Operational Use of the Mesoscale Prediction Model COAMPS in Poland

    DTIC Science & Technology

    2008-09-30

    participated in EGU General Assembly , Vienna Austria 13-18 April 2008, giving a poster presentation. Bogumil Jakubiak, University of Warsaw...participated in EGU General Assembly , Vienna Austria 13-18 April 2008, giving two posters presentation. Mikolaj Sierzega, University of Warwick – participated...model forecast to generate background error statistics. This helps us to identify and understand the uncertainties in high-resolution NWP forecasts

  14. An Extensible NetLogo Model for Visualizing Message Routing Protocols

    DTIC Science & Technology

    2017-08-01

    the hard sciences to the social sciences to computer-generated art. NetLogo represents the world as a set of...describe the model is shown here; for the supporting methods , refer to the source code. Approved for public release; distribution is unlimited. 4 iv...if ticks - last-inject > time-to-inject [inject] if run# > #runs [stop] end Next, we present some basic statistics collected for the

  15. Development of a procedure to model high-resolution wind profiles from smoothed or low-frequency data

    NASA Technical Reports Server (NTRS)

    Camp, D. W.

    1977-01-01

    The derivation of simulated Jimsphere wind profiles from low-frequency rawinsonde data and a generated set of white noise data are presented. A computer program is developed to model high-resolution wind profiles based on the statistical properties of data from the Kennedy Space Center, Florida. Comparison of the measured Jimsphere data, rawinsonde data, and the simulated profiles shows excellent agreement.

  16. Moisture Damage Modeling in Lime and Chemically Modified Asphalt at Nanolevel Using Ensemble Computational Intelligence

    PubMed Central

    2018-01-01

    This paper measures the adhesion/cohesion force among asphalt molecules at nanoscale level using an Atomic Force Microscopy (AFM) and models the moisture damage by applying state-of-the-art Computational Intelligence (CI) techniques (e.g., artificial neural network (ANN), support vector regression (SVR), and an Adaptive Neuro Fuzzy Inference System (ANFIS)). Various combinations of lime and chemicals as well as dry and wet environments are used to produce different asphalt samples. The parameters that were varied to generate different asphalt samples and measure the corresponding adhesion/cohesion forces are percentage of antistripping agents (e.g., Lime and Unichem), AFM tips K values, and AFM tip types. The CI methods are trained to model the adhesion/cohesion forces given the variation in values of the above parameters. To achieve enhanced performance, the statistical methods such as average, weighted average, and regression of the outputs generated by the CI techniques are used. The experimental results show that, of the three individual CI methods, ANN can model moisture damage to lime- and chemically modified asphalt better than the other two CI techniques for both wet and dry conditions. Moreover, the ensemble of CI along with statistical measurement provides better accuracy than any of the individual CI techniques. PMID:29849551

  17. Generation Y and Blood Donation: The Impact of Altruistic Help in a Darwiportunistic Scenario

    PubMed Central

    Scholz, Christian

    2010-01-01

    Summary This article focuses on the members of Generation Y and their willingness to offer voluntary (unpaid) blood donations. Using statistics from various sources, a three-stage model is developed to explain blood donation behaviour especially of this generation. It consists of i) developing altruism, ii) raising the willingness to donate blood, and iii) activating actual blood donation behaviour. Members of Generation Y live in a Darwinistic society. They also to some degree act opportunistically, but not in contradiction to altruism. For that reason, the article positions itself in the theoretical framework of Darwi-portunism and derives practical suggestions as well as implications for research. PMID:21048826

  18. Error Analysis: How Precise is Fused Deposition Modeling in Fabrication of Bone Models in Comparison to the Parent Bones?

    PubMed

    Reddy, M V; Eachempati, Krishnakiran; Gurava Reddy, A V; Mugalur, Aakash

    2018-01-01

    Rapid prototyping (RP) is used widely in dental and faciomaxillary surgery with anecdotal uses in orthopedics. The purview of RP in orthopedics is vast. However, there is no error analysis reported in the literature on bone models generated using office-based RP. This study evaluates the accuracy of fused deposition modeling (FDM) using standard tessellation language (STL) files and errors generated during the fabrication of bone models. Nine dry bones were selected and were computed tomography (CT) scanned. STL files were procured from the CT scans and three-dimensional (3D) models of the bones were printed using our in-house FDM based 3D printer using Acrylonitrile Butadiene Styrene (ABS) filament. Measurements were made on the bone and 3D models according to data collection procedures for forensic skeletal material. Statistical analysis was performed to establish interobserver co-relation for measurements on dry bones and the 3D bone models. Statistical analysis was performed using SPSS version 13.0 software to analyze the collected data. The inter-observer reliability was established using intra-class coefficient for both the dry bones and the 3D models. The mean of absolute difference is 0.4 that is very minimal. The 3D models are comparable to the dry bones. STL file dependent FDM using ABS material produces near-anatomical 3D models. The high 3D accuracy hold a promise in the clinical scenario for preoperative planning, mock surgery, and choice of implants and prostheses, especially in complicated acetabular trauma and complex hip surgeries.

  19. Multiple-solution problems in a statistics classroom: an example

    NASA Astrophysics Data System (ADS)

    Chu, Chi Wing; Chan, Kevin L. T.; Chan, Wai-Sum; Kwong, Koon-Shing

    2017-11-01

    The mathematics education literature shows that encouraging students to develop multiple solutions for given problems has a positive effect on students' understanding and creativity. In this paper, we present an example of multiple-solution problems in statistics involving a set of non-traditional dice. In particular, we consider the exact probability mass distribution for the sum of face values. Four different ways of solving the problem are discussed. The solutions span various basic concepts in different mathematical disciplines (sample space in probability theory, the probability generating function in statistics, integer partition in basic combinatorics and individual risk model in actuarial science) and thus promotes upper undergraduate students' awareness of knowledge connections between their courses. All solutions of the example are implemented using the R statistical software package.

  20. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  1. Synthetic Earthquake Statistics From Physical Fault Models for the Lower Rhine Embayment

    NASA Astrophysics Data System (ADS)

    Brietzke, G. B.; Hainzl, S.; Zöller, G.

    2012-04-01

    As of today, seismic risk and hazard estimates mostly use pure empirical, stochastic models of earthquake fault systems tuned specifically to the vulnerable areas of interest. Although such models allow for reasonable risk estimates they fail to provide a link between the observed seismicity and the underlying physical processes. Solving a state-of-the-art fully dynamic description set of all relevant physical processes related to earthquake fault systems is likely not useful since it comes with a large number of degrees of freedom, poor constraints on its model parameters and a huge computational effort. Here, quasi-static and quasi-dynamic physical fault simulators provide a compromise between physical completeness and computational affordability and aim at providing a link between basic physical concepts and statistics of seismicity. Within the framework of quasi-static and quasi-dynamic earthquake simulators we investigate a model of the Lower Rhine Embayment (LRE) that is based upon seismological and geological data. We present and discuss statistics of the spatio-temporal behavior of generated synthetic earthquake catalogs with respect to simplification (e.g. simple two-fault cases) as well as to complication (e.g. hidden faults, geometric complexity, heterogeneities of constitutive parameters).

  2. Logical reasoning versus information processing in the dual-strategy model of reasoning.

    PubMed

    Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc

    2017-01-01

    One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  3. Prognostic factors in patients with advanced cancer: use of the patient-generated subjective global assessment in survival prediction.

    PubMed

    Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie

    2010-10-01

    To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.

  4. Statistics-based model for prediction of chemical biosynthesis yield from Saccharomyces cerevisiae

    PubMed Central

    2011-01-01

    Background The robustness of Saccharomyces cerevisiae in facilitating industrial-scale production of ethanol extends its utilization as a platform to synthesize other metabolites. Metabolic engineering strategies, typically via pathway overexpression and deletion, continue to play a key role for optimizing the conversion efficiency of substrates into the desired products. However, chemical production titer or yield remains difficult to predict based on reaction stoichiometry and mass balance. We sampled a large space of data of chemical production from S. cerevisiae, and developed a statistics-based model to calculate production yield using input variables that represent the number of enzymatic steps in the key biosynthetic pathway of interest, metabolic modifications, cultivation modes, nutrition and oxygen availability. Results Based on the production data of about 40 chemicals produced from S. cerevisiae, metabolic engineering methods, nutrient supplementation, and fermentation conditions described therein, we generated mathematical models with numerical and categorical variables to predict production yield. Statistically, the models showed that: 1. Chemical production from central metabolic precursors decreased exponentially with increasing number of enzymatic steps for biosynthesis (>30% loss of yield per enzymatic step, P-value = 0); 2. Categorical variables of gene overexpression and knockout improved product yield by 2~4 folds (P-value < 0.1); 3. Addition of notable amount of intermediate precursors or nutrients improved product yield by over five folds (P-value < 0.05); 4. Performing the cultivation in a well-controlled bioreactor enhanced the yield of product by three folds (P-value < 0.05); 5. Contribution of oxygen to product yield was not statistically significant. Yield calculations for various chemicals using the linear model were in fairly good agreement with the experimental values. The model generally underestimated the ethanol production as compared to other chemicals, which supported the notion that the metabolism of Saccharomyces cerevisiae has historically evolved for robust alcohol fermentation. Conclusions We generated simple mathematical models for first-order approximation of chemical production yield from S. cerevisiae. These linear models provide empirical insights to the effects of strain engineering and cultivation conditions toward biosynthetic efficiency. These models may not only provide guidelines for metabolic engineers to synthesize desired products, but also be useful to compare the biosynthesis performance among different research papers. PMID:21689458

  5. Extracting morphologies from third harmonic generation images of structurally normal human brain tissue.

    PubMed

    Zhang, Zhiqing; Kuzmin, Nikolay V; Groot, Marie Louise; de Munck, Jan C

    2017-06-01

    The morphologies contained in 3D third harmonic generation (THG) images of human brain tissue can report on the pathological state of the tissue. However, the complexity of THG brain images makes the usage of modern image processing tools, especially those of image filtering, segmentation and validation, to extract this information challenging. We developed a salient edge-enhancing model of anisotropic diffusion for image filtering, based on higher order statistics. We split the intrinsic 3-phase segmentation problem into two 2-phase segmentation problems, each of which we solved with a dedicated model, active contour weighted by prior extreme. We applied the novel proposed algorithms to THG images of structurally normal ex-vivo human brain tissue, revealing key tissue components-brain cells, microvessels and neuropil, enabling statistical characterization of these components. Comprehensive comparison to manually delineated ground truth validated the proposed algorithms. Quantitative comparison to second harmonic generation/auto-fluorescence images, acquired simultaneously from the same tissue area, confirmed the correctness of the main THG features detected. The software and test datasets are available from the authors. z.zhang@vu.nl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  6. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    NASA Technical Reports Server (NTRS)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  7. Evaluating the uncertainty of predicting future climate time series at the hourly time scale

    NASA Astrophysics Data System (ADS)

    Caporali, E.; Fatichi, S.; Ivanov, V. Y.

    2011-12-01

    A stochastic downscaling methodology is developed to generate hourly, point-scale time series for several meteorological variables, such as precipitation, cloud cover, shortwave radiation, air temperature, relative humidity, wind speed, and atmospheric pressure. The methodology uses multi-model General Circulation Model (GCM) realizations and an hourly weather generator, AWE-GEN. Probabilistic descriptions of factors of change (a measure of climate change with respect to historic conditions) are computed for several climate statistics and different aggregation times using a Bayesian approach that weights the individual GCM contributions. The Monte Carlo method is applied to sample the factors of change from their respective distributions thereby permitting the generation of time series in an ensemble fashion, which reflects the uncertainty of climate projections of future as well as the uncertainty of the downscaling procedure. Applications of the methodology and probabilistic expressions of certainty in reproducing future climates for the periods, 2000 - 2009, 2046 - 2065 and 2081 - 2100, using the 1962 - 1992 period as the baseline, are discussed for the location of Firenze (Italy). The climate predictions for the period of 2000 - 2009 are tested against observations permitting to assess the reliability and uncertainties of the methodology in reproducing statistics of meteorological variables at different time scales.

  8. Evolution in Cloud Population Statistics of the MJO: From AMIE Field Observations to Global-Cloud Permitting Models Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kollias, Pavlos

    This is a multi-institutional, collaborative project using a three-tier modeling approach to bridge field observations and global cloud-permitting models, with emphases on cloud population structural evolution through various large-scale environments. Our contribution was in data analysis for the generation of high value cloud and precipitation products and derive cloud statistics for model validation. There are two areas in data analysis that we contributed: the development of a synergistic cloud and precipitation cloud classification that identify different cloud (e.g. shallow cumulus, cirrus) and precipitation types (shallow, deep, convective, stratiform) using profiling ARM observations and the development of a quantitative precipitation ratemore » retrieval algorithm using profiling ARM observations. Similar efforts have been developed in the past for precipitation (weather radars), but not for the millimeter-wavelength (cloud) radar deployed at the ARM sites.« less

  9. Why Income Comparison is Rational

    NASA Technical Reports Server (NTRS)

    Wolpert, David H.

    2010-01-01

    A major factor affecting a person s happiness is the gap between their income and their neighbors , independent of their own income. This effect is strongest when the neighbor has moderately higher income. In addition a person s lifetime happiness often follows a "U" shape. Previous models have explained subsets of these phenomena, typically assuming the person has limited ability to assess their own (hedonic) utility. Here I present a model that explains all the phenomena, without such assumptions. In this model greater income of your neighbor is statistical data that, if carefully analyzed, would recommend that you explore for a new income-generating strategy. This explains unhappiness that your neighbor has greater income, as an emotional "prod" that induces you to explore, in accord with careful statistical analysis. It explains the "U" shape of happiness similarly. Another benefit of this model is that it makes many falsifiable predictions.

  10. Joint Seasonal ARMA Approach for Modeling of Load Forecast Errors in Planning Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hafen, Ryan P.; Samaan, Nader A.; Makarov, Yuri V.

    2014-04-14

    To make informed and robust decisions in the probabilistic power system operation and planning process, it is critical to conduct multiple simulations of the generated combinations of wind and load parameters and their forecast errors to handle the variability and uncertainty of these time series. In order for the simulation results to be trustworthy, the simulated series must preserve the salient statistical characteristics of the real series. In this paper, we analyze day-ahead load forecast error data from multiple balancing authority locations and characterize statistical properties such as mean, standard deviation, autocorrelation, correlation between series, time-of-day bias, and time-of-day autocorrelation.more » We then construct and validate a seasonal autoregressive moving average (ARMA) model to model these characteristics, and use the model to jointly simulate day-ahead load forecast error series for all BAs.« less

  11. An Overview of R in Health Decision Sciences.

    PubMed

    Jalal, Hawre; Pechlivanoglou, Petros; Krijkamp, Eline; Alarid-Escudero, Fernando; Enns, Eva; Hunink, M G Myriam

    2017-10-01

    As the complexity of health decision science applications increases, high-level programming languages are increasingly adopted for statistical analyses and numerical computations. These programming languages facilitate sophisticated modeling, model documentation, and analysis reproducibility. Among the high-level programming languages, the statistical programming framework R is gaining increased recognition. R is freely available, cross-platform compatible, and open source. A large community of users who have generated an extensive collection of well-documented packages and functions supports it. These functions facilitate applications of health decision science methodology as well as the visualization and communication of results. Although R's popularity is increasing among health decision scientists, methodological extensions of R in the field of decision analysis remain isolated. The purpose of this article is to provide an overview of existing R functionality that is applicable to the various stages of decision analysis, including model design, input parameter estimation, and analysis of model outputs.

  12. Scaling in geology: landforms and earthquakes.

    PubMed Central

    Turcotte, D L

    1995-01-01

    Landforms and earthquakes appear to be extremely complex; yet, there is order in the complexity. Both satisfy fractal statistics in a variety of ways. A basic question is whether the fractal behavior is due to scale invariance or is the signature of a broadly applicable class of physical processes. Both landscape evolution and regional seismicity appear to be examples of self-organized critical phenomena. A variety of statistical models have been proposed to model landforms, including diffusion-limited aggregation, self-avoiding percolation, and cellular automata. Many authors have studied the behavior of multiple slider-block models, both in terms of the rupture of a fault to generate an earthquake and in terms of the interactions between faults associated with regional seismicity. The slider-block models exhibit a remarkably rich spectrum of behavior; two slider blocks can exhibit low-order chaotic behavior. Large numbers of slider blocks clearly exhibit self-organized critical behavior. Images Fig. 6 PMID:11607562

  13. Statistical tests of simple earthquake cycle models

    USGS Publications Warehouse

    Devries, Phoebe M. R.; Evans, Eileen

    2016-01-01

    A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM <~ 4.0 × 1019 Pa s and ηM >~ 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.

  14. Protein Models Docking Benchmark 2

    PubMed Central

    Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.

    2015-01-01

    Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716

  15. Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

    NASA Astrophysics Data System (ADS)

    Xu, Y.; Jones, A. D.; Rhoades, A.

    2017-12-01

    Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for by each predictor. By comparing the statistical model derived from true world information and modeled world information, we assess the errors lying in the physics of the numerical models. This work provides a unique insight to assess the performance of numerical climate models, and can be used to guide improvement of precipitation prediction.

  16. Re-Evaluation of Event Correlations in Virtual California Using Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Glasscoe, M. T.; Heflin, M. B.; Granat, R. A.; Yikilmaz, M. B.; Heien, E.; Rundle, J.; Donnellan, A.

    2010-12-01

    Fusing the results of simulation tools with statistical analysis methods has contributed to our better understanding of the earthquake process. In a previous study, we used a statistical method to investigate emergent phenomena in data produced by the Virtual California earthquake simulator. The analysis indicated that there were some interesting fault interactions and possible triggering and quiescence relationships between events. We have converted the original code from Matlab to python/C++ and are now evaluating data from the most recent version of Virtual California in order to analyze and compare any new behavior exhibited by the model. The Virtual California earthquake simulator can be used to study fault and stress interaction scenarios for realistic California earthquakes. The simulation generates a synthetic earthquake catalog of events with a minimum size of ~M 5.8 that can be evaluated using statistical analysis methods. Virtual California utilizes realistic fault geometries and a simple Amontons - Coulomb stick and slip friction law in order to drive the earthquake process by means of a back-slip model where loading of each segment occurs due to the accumulation of a slip deficit at the prescribed slip rate of the segment. Like any complex system, Virtual California may generate emergent phenomena unexpected even by its designers. In order to investigate this, we have developed a statistical method that analyzes the interaction between Virtual California fault elements and thereby determine whether events on any given fault elements show correlated behavior. Our method examines events on one fault element and then determines whether there is an associated event within a specified time window on a second fault element. Note that an event in our analysis is defined as any time an element slips, rather than any particular “earthquake” along the entire fault length. Results are then tabulated and then differenced with an expected correlation, calculated by assuming a uniform distribution of events in time. We generate a correlation score matrix, which indicates how weakly or strongly correlated each fault element is to every other in the course of the VC simulation. We calculate correlation scores by summing the difference between the actual and expected correlations over all time window lengths and normalizing by the time window size. The correlation score matrix can focus attention on the most interesting areas for more in-depth analysis of event correlation vs. time. The previous study included 59 faults (639 elements) in the model, which included all the faults save the creeping section of the San Andreas. The analysis spanned 40,000 yrs of Virtual California-generated earthquake data. The newly revised VC model includes 70 faults, 8720 fault elements, and spans 110,000 years. Due to computational considerations, we will evaluate the elements comprising the southern California region, which our previous study indicated showed interesting fault interaction and event triggering/quiescence relationships.

  17. Weighting Statistical Inputs for Data Used to Support Effective Decision Making During Severe Emergency Weather and Environmental Events

    NASA Technical Reports Server (NTRS)

    Gardner, Adrian

    2010-01-01

    National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.

  18. Simulating Flying Insects Using Dynamics and Data-Driven Noise Modeling to Generate Diverse Collective Behaviors

    PubMed Central

    Ren, Jiaping; Wang, Xinjie; Manocha, Dinesh

    2016-01-01

    We present a biologically plausible dynamics model to simulate swarms of flying insects. Our formulation, which is based on biological conclusions and experimental observations, is designed to simulate large insect swarms of varying densities. We use a force-based model that captures different interactions between the insects and the environment and computes collision-free trajectories for each individual insect. Furthermore, we model the noise as a constructive force at the collective level and present a technique to generate noise-induced insect movements in a large swarm that are similar to those observed in real-world trajectories. We use a data-driven formulation that is based on pre-recorded insect trajectories. We also present a novel evaluation metric and a statistical validation approach that takes into account various characteristics of insect motions. In practice, the combination of Curl noise function with our dynamics model is used to generate realistic swarm simulations and emergent behaviors. We highlight its performance for simulating large flying swarms of midges, fruit fly, locusts and moths and demonstrate many collective behaviors, including aggregation, migration, phase transition, and escape responses. PMID:27187068

  19. Exact Asymptotics of the Freezing Transition of a Logarithmically Correlated Random Energy Model

    NASA Astrophysics Data System (ADS)

    Webb, Christian

    2011-12-01

    We consider a logarithmically correlated random energy model, namely a model for directed polymers on a Cayley tree, which was introduced by Derrida and Spohn. We prove asymptotic properties of a generating function of the partition function of the model by studying a discrete time analogy of the KPP-equation—thus translating Bramson's work on the KPP-equation into a discrete time case. We also discuss connections to extreme value statistics of a branching random walk and a rescaled multiplicative cascade measure beyond the critical point.

  20. Stochastic generators of multi-site daily temperature: comparison of performances in various applications

    NASA Astrophysics Data System (ADS)

    Evin, Guillaume; Favre, Anne-Catherine; Hingray, Benoit

    2018-02-01

    We present a multi-site stochastic model for the generation of average daily temperature, which includes a flexible parametric distribution and a multivariate autoregressive process. Different versions of this model are applied to a set of 26 stations located in Switzerland. The importance of specific statistical characteristics of the model (seasonality, marginal distributions of standardized temperature, spatial and temporal dependence) is discussed. In particular, the proposed marginal distribution is shown to improve the reproduction of extreme temperatures (minima and maxima). We also demonstrate that the frequency and duration of cold spells and heat waves are dramatically underestimated when the autocorrelation of temperature is not taken into account in the model. An adequate representation of these characteristics can be crucial depending on the field of application, and we discuss potential implications in different contexts (agriculture, forestry, hydrology, human health).

  1. National Assessment of Energy Storage for Grid Balancing and Arbitrage: Phase 1, WECC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kintner-Meyer, Michael CW; Balducci, Patrick J.; Colella, Whitney G.

    2012-06-01

    To examine the role that energy storage could play in mitigating the impacts of the stochastic variability of wind generation on regional grid operation, the Pacific Northwest National Laboratory (PNNL) examined a hypothetical 2020 grid scenario in which additional wind generation capacity is built to meet renewable portfolio standard targets in the Western Interconnection. PNNL developed a stochastic model for estimating the balancing requirements using historical wind statistics and forecasting error, a detailed engineering model to analyze the dispatch of energy storage and fast-ramping generation devices for estimating size requirements of energy storage and generation systems for meeting new balancingmore » requirements, and financial models for estimating the life-cycle cost of storage and generation systems in addressing the future balancing requirements for sub-regions in the Western Interconnection. Evaluated technologies include combustion turbines, sodium sulfur (Na-S) batteries, lithium ion batteries, pumped-hydro energy storage, compressed air energy storage, flywheels, redox flow batteries, and demand response. Distinct power and energy capacity requirements were estimated for each technology option, and battery size was optimized to minimize costs. Modeling results indicate that in a future power grid with high-penetration of renewables, the most cost competitive technologies for meeting balancing requirements include Na-S batteries and flywheels.« less

  2. A deterministic model for the sublayer streaks in turbulent boundary layers for application to flow control.

    PubMed

    Carpenter, Peter W; Kudar, Karen L; Ali, Reza; Sen, Pradeep K; Davies, Christopher

    2007-10-15

    We present a relatively simple, deterministic, theoretical model for the sublayer streaks in a turbulent boundary layer based on an analogy with Klebanoff modes. Our approach is to generate the streamwise vortices found in the buffer layer by means of a vorticity source in the form of a fictitious body force. It is found that the strongest streaks correspond to a spanwise wavelength that lies within the range of the experimentally observed values for the statistical mean streak spacing. We also present results showing the effect of streamwise pressure gradient, Reynolds number and wall compliance on the sublayer streaks. The theoretical predictions for the effects of wall compliance on the streak characteristics agree well with experimental data. Our proposed theoretical model for the quasi-periodic bursting cycle is also described, which places the streak modelling in context. The proposed bursting process is as follows: (i) streamwise vortices generate sublayer streaks and other vortical elements generate propagating plane waves, (ii) when the streaks reach a sufficient amplitude, they interact nonlinearly with the plane waves to produce oblique waves that exhibit transient growth, and (iii) the oblique waves interact nonlinearly with the plane wave to generate streamwise vortices; these in turn generate the sublayer streaks and so the cycle is renewed.

  3. Development of an errorable car-following driver model

    NASA Astrophysics Data System (ADS)

    Yang, H.-H.; Peng, H.

    2010-06-01

    An errorable car-following driver model is presented in this paper. An errorable driver model is one that emulates human driver's functions and can generate both nominal (error-free), as well as devious (with error) behaviours. This model was developed for evaluation and design of active safety systems. The car-following data used for developing and validating the model were obtained from a large-scale naturalistic driving database. The stochastic car-following behaviour was first analysed and modelled as a random process. Three error-inducing behaviours were then introduced. First, human perceptual limitation was studied and implemented. Distraction due to non-driving tasks was then identified based on the statistical analysis of the driving data. Finally, time delay of human drivers was estimated through a recursive least-square identification process. By including these three error-inducing behaviours, rear-end collisions with the lead vehicle could occur. The simulated crash rate was found to be similar but somewhat higher than that reported in traffic statistics.

  4. Interactive classification and content-based retrieval of tissue images

    NASA Astrophysics Data System (ADS)

    Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

    2002-11-01

    We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.

  5. Statistical iterative material image reconstruction for spectral CT using a semi-empirical forward model

    NASA Astrophysics Data System (ADS)

    Mechlem, Korbinian; Ehn, Sebastian; Sellerer, Thorsten; Pfeiffer, Franz; Noël, Peter B.

    2017-03-01

    In spectral computed tomography (spectral CT), the additional information about the energy dependence of attenuation coefficients can be exploited to generate material selective images. These images have found applications in various areas such as artifact reduction, quantitative imaging or clinical diagnosis. However, significant noise amplification on material decomposed images remains a fundamental problem of spectral CT. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Separating these steps is suboptimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide an alternative, mathematically elegant approach to obtaining material selective images with improved tradeoffs between noise and resolution. Furthermore, image reconstruction and material decomposition can be performed jointly. This is accomplished by a forward model which directly connects the (expected) spectral projection measurements and the material selective images. To obtain this forward model, detailed knowledge of the different photon energy spectra and the detector response was assumed in previous work. However, accurately determining the spectrum is often difficult in practice. In this work, a new algorithm for statistical iterative material decomposition is presented. It uses a semi-empirical forward model which relies on simple calibration measurements. Furthermore, an efficient optimization algorithm based on separable surrogate functions is employed. This partially negates one of the major shortcomings of SIR, namely high computational cost and long reconstruction times. Numerical simulations and real experiments show strongly improved image quality and reduced statistical bias compared to projection-based material decomposition.

  6. Technical Note: Approximate Bayesian parameterization of a complex tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2013-08-01

    Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.

  7. The statistical overlap theory of chromatography using power law (fractal) statistics.

    PubMed

    Schure, Mark R; Davis, Joe M

    2011-12-30

    The chromatographic dimensionality was recently proposed as a measure of retention time spacing based on a power law (fractal) distribution. Using this model, a statistical overlap theory (SOT) for chromatographic peaks is developed that estimates the number of peak maxima as a function of the chromatographic dimension, saturation and scale. Power law models exhibit a threshold region whereby below a critical saturation value no loss of peak maxima due to peak fusion occurs as saturation increases. At moderate saturation, behavior is similar to the random (Poisson) peak model. At still higher saturation, the power law model shows loss of peaks nearly independent of the scale and dimension of the model. The physicochemical meaning of the power law scale parameter is discussed and shown to be equal to the Boltzmann-weighted free energy of transfer over the scale limits. The scale is discussed. Small scale range (small β) is shown to generate more uniform chromatograms. Large scale range chromatograms (large β) are shown to give occasional large excursions of retention times; this is a property of power laws where "wild" behavior is noted to occasionally occur. Both cases are shown to be useful depending on the chromatographic saturation. A scale-invariant model of the SOT shows very simple relationships between the fraction of peak maxima and the saturation, peak width and number of theoretical plates. These equations provide much insight into separations which follow power law statistics. Copyright © 2011 Elsevier B.V. All rights reserved.

  8. Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

    PubMed

    Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

    2014-01-01

    Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.

  9. Limited-information goodness-of-fit testing of diagnostic classification item response models.

    PubMed

    Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

    2016-11-01

    Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X 2 and the likelihood ratio statistic G 2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M 2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M 2 was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M 2 , we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M 2 and XLD2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144). © 2016 The British Psychological Society.

  10. Bayes Forest: a data-intensive generator of morphological tree clones

    PubMed Central

    Järvenpää, Marko; Åkerblom, Markku; Raumonen, Pasi; Kaasalainen, Mikko

    2017-01-01

    Abstract Detailed and realistic tree form generators have numerous applications in ecology and forestry. For example, the varying morphology of trees contributes differently to formation of landscapes, natural habitats of species, and eco-physiological characteristics of the biosphere. Here, we present an algorithm for generating morphological tree “clones” based on the detailed reconstruction of the laser scanning data, statistical measure of similarity, and a plant growth model with simple stochastic rules. The algorithm is designed to produce tree forms, i.e., morphological clones, similar (and not identical) in respect to tree-level structure, but varying in fine-scale structural detail. Although we opted for certain choices in our algorithm, individual parts may vary depending on the application, making it a general adaptable pipeline. Namely, we showed that a specific multipurpose procedural stochastic growth model can be algorithmically adjusted to produce the morphological clones replicated from the target experimentally measured tree. For this, we developed a statistical measure of similarity (structural distance) between any given pair of trees, which allows for the comprehensive comparing of the tree morphologies by means of empirical distributions describing the geometrical and topological features of a tree. Finally, we developed a programmable interface to manipulate data required by the algorithm. Our algorithm can be used in a variety of applications for exploration of the morphological potential of the growth models (both theoretical and experimental), arising in all sectors of plant science research. PMID:29020742

  11. Zero-state Markov switching count-data models: an empirical assessment.

    PubMed

    Malyshkina, Nataliya V; Mannering, Fred L

    2010-01-01

    In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.

  12. Multi-Topic Tracking Model for dynamic social network

    NASA Astrophysics Data System (ADS)

    Li, Yuhua; Liu, Changzheng; Zhao, Ming; Li, Ruixuan; Xiao, Hailing; Wang, Kai; Zhang, Jun

    2016-07-01

    The topic tracking problem has attracted much attention in the last decades. However, existing approaches rarely consider network structures and textual topics together. In this paper, we propose a novel statistical model based on dynamic bayesian network, namely Multi-Topic Tracking Model for Dynamic Social Network (MTTD). It takes influence phenomenon, selection phenomenon, document generative process and the evolution of textual topics into account. Specifically, in our MTTD model, Gibbs Random Field is defined to model the influence of historical status of users in the network and the interdependency between them in order to consider the influence phenomenon. To address the selection phenomenon, a stochastic block model is used to model the link generation process based on the users' interests to topics. Probabilistic Latent Semantic Analysis (PLSA) is used to describe the document generative process according to the users' interests. Finally, the dependence on the historical topic status is also considered to ensure the continuity of the topic itself in topic evolution model. Expectation Maximization (EM) algorithm is utilized to estimate parameters in the proposed MTTD model. Empirical experiments on real datasets show that the MTTD model performs better than Popular Event Tracking (PET) and Dynamic Topic Model (DTM) in generalization performance, topic interpretability performance, topic content evolution and topic popularity evolution performance.

  13. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    NASA Astrophysics Data System (ADS)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  14. Intermittency and dynamical Lee-Yang zeros of open quantum systems.

    PubMed

    Hickey, James M; Flindt, Christian; Garrahan, Juan P

    2014-12-01

    We use high-order cumulants to investigate the Lee-Yang zeros of generating functions of dynamical observables in open quantum systems. At long times the generating functions take on a large-deviation form with singularities of the associated cumulant generating functions-or dynamical free energies-signifying phase transitions in the ensemble of dynamical trajectories. We consider a driven three-level system as well as the dissipative Ising model. Both systems exhibit dynamical intermittency in the statistics of quantum jumps. From the short-time behavior of the dynamical Lee-Yang zeros, we identify critical values of the counting field which we attribute to the observed intermittency and dynamical phase coexistence. Furthermore, for the dissipative Ising model we construct a trajectory phase diagram and estimate the value of the transverse field where the stationary state changes from being ferromagnetic (inactive) to paramagnetic (active).

  15. The effect of noise-induced variance on parameter recovery from reaction times.

    PubMed

    Vadillo, Miguel A; Garaizar, Pablo

    2016-03-31

    Technical noise can compromise the precision and accuracy of the reaction times collected in psychological experiments, especially in the case of Internet-based studies. Although this noise seems to have only a small impact on traditional statistical analyses, its effects on model fit to reaction-time distributions remains unexplored. Across four simulations we study the impact of technical noise on parameter recovery from data generated from an ex-Gaussian distribution and from a Ratcliff Diffusion Model. Our results suggest that the impact of noise-induced variance tends to be limited to specific parameters and conditions. Although we encourage researchers to adopt all measures to reduce the impact of noise on reaction-time experiments, we conclude that the typical amount of noise-induced variance found in these experiments does not pose substantial problems for statistical analyses based on model fitting.

  16. Black swans and dragon kings: A unified model

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo

    2017-09-01

    The term “black swan” is a metaphor for outlier events whose statistics are characterized by Pareto's Law and by Zipf's Law; namely, statistics governed by power-law tails. The term “dragon king” is a metaphor for a singular outlier event which, in comparison with all other outlier events, is in a league of its own. As an illustrative example consider the wealth of a family that is sampled at random from a medieval society: the nobility constitutes the black-swan category, and the royal family constitutes the dragon-king category. In this paper we present and analyze a dynamical model that generates, universally and jointly, black swans and dragon kings. According to this model, growing from the microscopic scale to the macroscopic scale, black swans and dragon kings emerge together and invariantly with respect to initial conditions.

  17. Nonlinear estimation of parameters in biphasic Arrhenius plots.

    PubMed

    Puterman, M L; Hrboticky, N; Innis, S M

    1988-05-01

    This paper presents a formal procedure for the statistical analysis of data on the thermotropic behavior of membrane-bound enzymes generated using the Arrhenius equation and compares the analysis to several alternatives. Data is modeled by a bent hyperbola. Nonlinear regression is used to obtain estimates and standard errors of the intersection of line segments, defined as the transition temperature, and slopes, defined as energies of activation of the enzyme reaction. The methodology allows formal tests of the adequacy of a biphasic model rather than either a single straight line or a curvilinear model. Examples on data concerning the thermotropic behavior of pig brain synaptosomal acetylcholinesterase are given. The data support the biphasic temperature dependence of this enzyme. The methodology represents a formal procedure for statistical validation of any biphasic data and allows for calculation of all line parameters with estimates of precision.

  18. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory.

    PubMed

    Lord, Dominique; Washington, Simon P; Ivan, John N

    2005-01-01

    There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states-perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of "excess" zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to "excess" zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed-and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.

  19. Rage against the Machine: Evaluation Metrics in the 21st Century

    ERIC Educational Resources Information Center

    Yang, Charles

    2017-01-01

    I review the classic literature in generative grammar and Marr's three-level program for cognitive science to defend the Evaluation Metric as a psychological theory of language learning. Focusing on well-established facts of language variation, change, and use, I argue that optimal statistical principles embodied in Bayesian inference models are…

  20. Creating Matched Samples Using Exact Matching. Statistical Report 2016-3

    ERIC Educational Resources Information Center

    Godfrey, Kelly E.

    2016-01-01

    By creating and analyzing matched samples, researchers can simplify their analyses to include fewer covariate variables, relying less on model assumptions, and thus generating results that may be easier to report and interpret. When two groups essentially "look" the same, it is easier to explore their differences and make comparisons…

  1. Validation of non-stationary precipitation series for site-specific impact assessment: Comparison of two statistical downscaling techniques

    USDA-ARS?s Scientific Manuscript database

    The generation of realistic future precipitation scenarios is crucial for assessing their impacts on a range of environmental and socio-economic impact sectors. A scale mismatch exists, however, between the coarse spatial resolution at which global climate models (GCMs) output future climate scenari...

  2. Evaluate Hydrologic Response on Spatiotemporal Characteristics of Rainfall Using High Resolution Radar Rainfall Data and WRF-Hydro Model

    NASA Astrophysics Data System (ADS)

    Gao, S.; Fang, N. Z.

    2017-12-01

    A previously developed Dynamic Moving Storm (DMS) generator is a multivariate rainfall model simulating the complex nature of precipitation field: spatial variability, temporal variability, and storm movement. Previous effort by the authors has investigated the sensitivity of DMS parameters on corresponding hydrologic responses by using synthetic storms. In this study, the DMS generator has been upgraded to generate more realistic precipitation field. The dependence of hydrologic responses on rainfall features was investigated by dissecting the precipitation field into rain cells and modifying their spatio-temporal specification individually. To retrieve DMS parameters from radar rainfall data, rain cell segmentation and tracking algorithms were respectively developed and applied on high resolution radar rainfall data (1) to spatially determine the rain cells within individual radar image and (2) to temporally analyze their dynamic behavior. Statistics of DMS parameters were established by processing a long record of rainfall data (10 years) to keep the modification on real storms within the limit of regional climatology. Empirical distributions of the DMS parameters were calculated to reveal any preferential pattern and seasonality. Subsequently, the WRF-Hydro model forced by the remodeled and modified precipitation was used for hydrologic simulation. The study area was the Upper Trinity River Basin (UTRB) watershed, Texas; and two kinds of high resolution radar data i.e. the Next-Generation Radar (NEXRAD) level III Digital Hybrid Reflectivity (DHR) product and Multi-Radar Multi-Sensor (MRMS) precipitation rate product, were utilized to establish parameter statistics and to recreate/remodel historical events respectively. The results demonstrated that rainfall duration is a significant linkage between DMS parameters and their hydrologic impacts—any combination of spatiotemporal characteristics that keep rain cells longer over the catchment will produce higher peak discharge.

  3. Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data.

    PubMed

    Tom, Jennifer A; Sinsheimer, Janet S; Suchard, Marc A

    Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework.

  4. Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data

    PubMed Central

    Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.

    2015-01-01

    Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework. PMID:26681992

  5. Coexistence of Cu, Fe, Pb, and Zn oxides and chlorides as a determinant of chlorinated aromatics generation in municipal solid waste incinerator fly ash.

    PubMed

    Fujimori, Takashi; Tanino, Yuta; Takaoka, Masaki

    2014-01-01

    We investigated chemical determinants of the generation of chlorinated aromatic compounds (aromatic-Cls), such as polychlorinated biphenyls (PCBs) and chlorobenzenes (CBzs), in fly ash from municipal solid waste incineration. The influences of the following on aromatic-Cls formation in model fly ash (MFA) were systematically examined quantitatively and statistically: (i) inorganic chlorides (KCl, NaCl, CaCl2), (ii) base materials (SiO2, Al2O3, CaCO3), (iii) metal oxides (CuO, Fe2O3, PbO, ZnO), (iv) metal chlorides (CuCl2, FeCl3, PbCl2, ZnCl2), and (v) "coexisting multi-models." On the basis of aromatic-Cls concentrations, the ∑CBzs/∑PCBs ratio, and the similarity between distribution patterns, MFAs were categorized into six groups. The results and analysis indicated that the formation of aromatic-Cls depended strongly on the "coexistence condition", namely multimodels composed of not only metal chlorides, but also of metal oxides. The precise replication of metal chloride to oxide ratios, such as the precise ratios of Cu-, Fe-, Pb-, and Zn-chlorides and oxides, may be an essential factor in changing the thermochemical formation patterns of aromatic-Cls. Although CuCl2 acted as a promoter of aromatic-Cls generation, statistical analyses implied that FeCl3 also largely influenced the generation of aromatic-Cls under mixture conditions. Various additional components of fly ash were also comprehensively analyzed.

  6. Automation of Ocean Product Metrics

    DTIC Science & Technology

    2008-09-30

    Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data

  7. Local operators in kinetic wealth distribution

    NASA Astrophysics Data System (ADS)

    Andrecut, M.

    2016-05-01

    The statistical mechanics approach to wealth distribution is based on the conservative kinetic multi-agent model for money exchange, where the local interaction rule between the agents is analogous to the elastic particle scattering process. Here, we discuss the role of a class of conservative local operators, and we show that, depending on the values of their parameters, they can be used to generate all the relevant distributions. We also show numerically that in order to generate the power-law tail, an heterogeneous risk aversion model is required. By changing the parameters of these operators, one can also fine tune the resulting distributions in order to provide support for the emergence of a more egalitarian wealth distribution.

  8. Quantum statistics of Raman scattering model with Stokes mode generation

    NASA Technical Reports Server (NTRS)

    Tanatar, Bilal; Shumovsky, Alexander S.

    1994-01-01

    The model describing three coupled quantum oscillators with decay of Rayleigh mode into the Stokes and vibration (phonon) modes is examined. Due to the Manley-Rowe relations the problem of exact eigenvalues and eigenstates is reduced to the calculation of new orthogonal polynomials defined both by the difference and differential equations. The quantum statistical properties are examined in the case when initially: the Stokes mode is in the vacuum state; the Rayleigh mode is in the number state; and the vibration mode is in the number of or squeezed states. The collapses and revivals are obtained for different initial conditions as well as the change in time the sub-Poisson distribution by the super-Poisson distribution and vice versa.

  9. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    NASA Technical Reports Server (NTRS)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.

  10. Characterizing and Addressing the Need for Statistical Adjustment of Global Climate Model Data

    NASA Astrophysics Data System (ADS)

    White, K. D.; Baker, B.; Mueller, C.; Villarini, G.; Foley, P.; Friedman, D.

    2017-12-01

    As part of its mission to research and measure the effects of the changing climate, the U. S. Army Corps of Engineers (USACE) regularly uses the World Climate Research Programme's Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model dataset. However, these data are generated at a global level and are not fine-tuned for specific watersheds. This often causes CMIP5 output to vary from locally observed patterns in the climate. Several downscaling methods have been developed to increase the resolution of the CMIP5 data and decrease systemic differences to support decision-makers as they evaluate results at the watershed scale. Evaluating preliminary comparisons of observed and projected flow frequency curves over the US revealed a simple framework for water resources decision makers to plan and design water resources management measures under changing conditions using standard tools. Using this framework as a basis, USACE has begun to explore to use of statistical adjustment to alter global climate model data to better match the locally observed patterns while preserving the general structure and behavior of the model data. When paired with careful measurement and hypothesis testing, statistical adjustment can be particularly effective at navigating the compromise between the locally observed patterns and the global climate model structures for decision makers.

  11. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants.

    PubMed

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-04-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.

  12. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants

    PubMed Central

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-01-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325

  13. An accurate behavioral model for single-photon avalanche diode statistical performance simulation

    NASA Astrophysics Data System (ADS)

    Xu, Yue; Zhao, Tingchen; Li, Ding

    2018-01-01

    An accurate behavioral model is presented to simulate important statistical performance of single-photon avalanche diodes (SPADs), such as dark count and after-pulsing noise. The derived simulation model takes into account all important generation mechanisms of the two kinds of noise. For the first time, thermal agitation, trap-assisted tunneling and band-to-band tunneling mechanisms are simultaneously incorporated in the simulation model to evaluate dark count behavior of SPADs fabricated in deep sub-micron CMOS technology. Meanwhile, a complete carrier trapping and de-trapping process is considered in afterpulsing model and a simple analytical expression is derived to estimate after-pulsing probability. In particular, the key model parameters of avalanche triggering probability and electric field dependence of excess bias voltage are extracted from Geiger-mode TCAD simulation and this behavioral simulation model doesn't include any empirical parameters. The developed SPAD model is implemented in Verilog-A behavioral hardware description language and successfully operated on commercial Cadence Spectre simulator, showing good universality and compatibility. The model simulation results are in a good accordance with the test data, validating high simulation accuracy.

  14. Suction forces generated by passive bile bag drainage on a model of post-subdural hematoma evacuation.

    PubMed

    Tenny, Steven O; Thorell, William E

    2018-05-05

    Passive drainage systems are commonly used after subdural hematoma evacuation but there is a dearth of published data regarding the suction forces created. We set out to quantify the suction forces generated by a passive drainage system. We created a model of passive drainage after subdural hematoma evacuation. We measured the maximum suction force generated with a bile bag drain for both empty drain tubing and fluid-filled drain tube causing a siphoning effect. We took measurements at varying heights of the bile bag to analyze if bile bag height changed suction forces generated. An empty bile bag with no fluid in the drainage tube connected to a rigid, fluid-filled model creates minimal suction force of 0.9 mmHg (95% CI 0.64-1.16 mmHg). When fluid fills the drain tubing, a siphoning effect is created and can generate suction forces ranging from 18.7 to 30.6 mmHg depending on the relative position of the bile bag and filled amount of the bile bag. The suction forces generated are statistically different if the bile bag is 50 cm below, level with or 50 cm above the experimental model. Passive bile bag drainage does not generate significant suction on a fluid-filled rigid model if the drain tubing is empty. If fluid fills the drain tubing then siphoning occurs and can increase the suction force of a passive bile bag drainage system to levels comparable to partially filled Jackson-Pratt bulb drainage.

  15. Estimating methane emissions from landfills based on rainfall, ambient temperature, and waste composition: The CLEEN model.

    PubMed

    Karanjekar, Richa V; Bhatt, Arpita; Altouqui, Said; Jangikhatoonabad, Neda; Durai, Vennila; Sattler, Melanie L; Hossain, M D Sahadat; Chen, Victoria

    2015-12-01

    Accurately estimating landfill methane emissions is important for quantifying a landfill's greenhouse gas emissions and power generation potential. Current models, including LandGEM and IPCC, often greatly simplify treatment of factors like rainfall and ambient temperature, which can substantially impact gas production. The newly developed Capturing Landfill Emissions for Energy Needs (CLEEN) model aims to improve landfill methane generation estimates, but still require inputs that are fairly easy to obtain: waste composition, annual rainfall, and ambient temperature. To develop the model, methane generation was measured from 27 laboratory scale landfill reactors, with varying waste compositions (ranging from 0% to 100%); average rainfall rates of 2, 6, and 12 mm/day; and temperatures of 20, 30, and 37°C, according to a statistical experimental design. Refuse components considered were the major biodegradable wastes, food, paper, yard/wood, and textile, as well as inert inorganic waste. Based on the data collected, a multiple linear regression equation (R(2)=0.75) was developed to predict first-order methane generation rate constant values k as functions of waste composition, annual rainfall, and temperature. Because, laboratory methane generation rates exceed field rates, a second scale-up regression equation for k was developed using actual gas-recovery data from 11 landfills in high-income countries with conventional operation. The Capturing Landfill Emissions for Energy Needs (CLEEN) model was developed by incorporating both regression equations into the first-order decay based model for estimating methane generation rates from landfills. CLEEN model values were compared to actual field data from 6 US landfills, and to estimates from LandGEM and IPCC. For 4 of the 6 cases, CLEEN model estimates were the closest to actual. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  17. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  18. Experiments in monthly mean simulation of the atmosphere with a coarse-mesh general circulation model

    NASA Technical Reports Server (NTRS)

    Lutz, R. J.; Spar, J.

    1978-01-01

    The Hansen atmospheric model was used to compute five monthly forecasts (October 1976 through February 1977). The comparison is based on an energetics analysis, meridional and vertical profiles, error statistics, and prognostic and observed mean maps. The monthly mean model simulations suffer from several defects. There is, in general, no skill in the simulation of the monthly mean sea-level pressure field, and only marginal skill is indicated for the 850 mb temperatures and 500 mb heights. The coarse-mesh model appears to generate a less satisfactory monthly mean simulation than the finer mesh GISS model.

  19. Statistical Comparison of Spike Responses to Natural Stimuli in Monkey Area V1 With Simulated Responses of a Detailed Laminar Network Model for a Patch of V1

    PubMed Central

    Schuch, Klaus; Logothetis, Nikos K.; Maass, Wolfgang

    2011-01-01

    A major goal of computational neuroscience is the creation of computer models for cortical areas whose response to sensory stimuli resembles that of cortical areas in vivo in important aspects. It is seldom considered whether the simulated spiking activity is realistic (in a statistical sense) in response to natural stimuli. Because certain statistical properties of spike responses were suggested to facilitate computations in the cortex, acquiring a realistic firing regimen in cortical network models might be a prerequisite for analyzing their computational functions. We present a characterization and comparison of the statistical response properties of the primary visual cortex (V1) in vivo and in silico in response to natural stimuli. We recorded from multiple electrodes in area V1 of 4 macaque monkeys and developed a large state-of-the-art network model for a 5 × 5-mm patch of V1 composed of 35,000 neurons and 3.9 million synapses that integrates previously published anatomical and physiological details. By quantitative comparison of the model response to the “statistical fingerprint” of responses in vivo, we find that our model for a patch of V1 responds to the same movie in a way which matches the statistical structure of the recorded data surprisingly well. The deviation between the firing regimen of the model and the in vivo data are on the same level as deviations among monkeys and sessions. This suggests that, despite strong simplifications and abstractions of cortical network models, they are nevertheless capable of generating realistic spiking activity. To reach a realistic firing state, it was not only necessary to include both N-methyl-d-aspartate and GABAB synaptic conductances in our model, but also to markedly increase the strength of excitatory synapses onto inhibitory neurons (>2-fold) in comparison to literature values, hinting at the importance to carefully adjust the effect of inhibition for achieving realistic dynamics in current network models. PMID:21106898

  20. Signal detection of adverse events with imperfect confirmation rates in vaccine safety studies using self-controlled case series design.

    PubMed

    Xu, Stanley; Newcomer, Sophia; Nelson, Jennifer; Qian, Lei; McClure, David; Pan, Yi; Zeng, Chan; Glanz, Jason

    2014-05-01

    The Vaccine Safety Datalink project captures electronic health record data including vaccinations and medically attended adverse events on 8.8 million enrollees annually from participating managed care organizations in the United States. While the automated vaccination data are generally of high quality, a presumptive adverse event based on diagnosis codes in automated health care data may not be true (misclassification). Consequently, analyses using automated health care data can generate false positive results, where an association between the vaccine and outcome is incorrectly identified, as well as false negative findings, where a true association or signal is missed. We developed novel conditional Poisson regression models and fixed effects models that accommodate misclassification of adverse event outcome for self-controlled case series design. We conducted simulation studies to evaluate their performance in signal detection in vaccine safety hypotheses generating (screening) studies. We also reanalyzed four previously identified signals in a recent vaccine safety study using the newly proposed models. Our simulation studies demonstrated that (i) outcome misclassification resulted in both false positive and false negative signals in screening studies; (ii) the newly proposed models reduced both the rates of false positive and false negative signals. In reanalyses of four previously identified signals using the novel statistical models, the incidence rate ratio estimates and statistical significances were similar to those using conventional models and including only medical record review confirmed cases. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Monte Carlo Analysis of Reservoir Models Using Seismic Data and Geostatistical Models

    NASA Astrophysics Data System (ADS)

    Zunino, A.; Mosegaard, K.; Lange, K.; Melnikova, Y.; Hansen, T. M.

    2013-12-01

    We present a study on the analysis of petroleum reservoir models consistent with seismic data and geostatistical constraints performed on a synthetic reservoir model. Our aim is to invert directly for structure and rock bulk properties of the target reservoir zone. To infer the rock facies, porosity and oil saturation seismology alone is not sufficient but a rock physics model must be taken into account, which links the unknown properties to the elastic parameters. We then combine a rock physics model with a simple convolutional approach for seismic waves to invert the "measured" seismograms. To solve this inverse problem, we employ a Markov chain Monte Carlo (MCMC) method, because it offers the possibility to handle non-linearity, complex and multi-step forward models and provides realistic estimates of uncertainties. However, for large data sets the MCMC method may be impractical because of a very high computational demand. To face this challenge one strategy is to feed the algorithm with realistic models, hence relying on proper prior information. To address this problem, we utilize an algorithm drawn from geostatistics to generate geologically plausible models which represent samples of the prior distribution. The geostatistical algorithm learns the multiple-point statistics from prototype models (in the form of training images), then generates thousands of different models which are accepted or rejected by a Metropolis sampler. To further reduce the computation time we parallelize the software and run it on multi-core machines. The solution of the inverse problem is then represented by a collection of reservoir models in terms of facies, porosity and oil saturation, which constitute samples of the posterior distribution. We are finally able to produce probability maps of the properties we are interested in by performing statistical analysis on the collection of solutions.

  2. An analytical and numerical study of Galton-Watson branching processes relevant to population dynamics

    NASA Astrophysics Data System (ADS)

    Jang, Sa-Han

    Galton-Watson branching processes of relevance to human population dynamics are the subject of this thesis. We begin with an historical survey of the invention of the invention of this model in the middle of the 19th century, for the purpose of modelling the extinction of unusual surnames in France and Britain. We then review the principal developments and refinements of this model, and their applications to a wide variety of problems in biology and physics. Next, we discuss in detail the case where the probability generating function for a Galton-Watson branching process is a geometric series, which can be summed in closed form to yield a fractional linear generating function that can be iterated indefinitely in closed form. We then describe the matrix method of Keyfitz and Tyree, and use it to determine how large a matrix must be chosen to model accurately a Galton-Watson branching process for a very large number of generations, of the order of hundreds or even thousands. Finally, we show that any attempt to explain the recent evidence for the existence thousands of generations ago of a 'mitochondrial Eve' and a 'Y-chromosomal Adam' in terms of a the standard Galton-Watson branching process, or indeed any statistical model that assumes equality of probabilities of passing one's genes to one's descendents in later generations, is unlikely to be successful. We explain that such models take no account of the advantages that the descendents of the most successful individuals in earlier generations enjoy over their contemporaries, which must play a key role in human evolution.

  3. Multiple Point Statistics algorithm based on direct sampling and multi-resolution images

    NASA Astrophysics Data System (ADS)

    Julien, S.; Renard, P.; Chugunova, T.

    2017-12-01

    Multiple Point Statistics (MPS) has become popular for more than one decade in Earth Sciences, because these methods allow to generate random fields reproducing highly complex spatial features given in a conceptual model, the training image, while classical geostatistics techniques based on bi-point statistics (covariance or variogram) fail to generate realistic models. Among MPS methods, the direct sampling consists in borrowing patterns from the training image to populate a simulation grid. This latter is sequentially filled by visiting each of these nodes in a random order, and then the patterns, whose the number of nodes is fixed, become narrower during the simulation process, as the simulation grid is more densely informed. Hence, large scale structures are caught in the beginning of the simulation and small scale ones in the end. However, MPS may mix spatial characteristics distinguishable at different scales in the training image, and then loose the spatial arrangement of different structures. To overcome this limitation, we propose to perform MPS simulation using a decomposition of the training image in a set of images at multiple resolutions. Applying a Gaussian kernel onto the training image (convolution) results in a lower resolution image, and iterating this process, a pyramid of images depicting fewer details at each level is built, as it can be done in image processing for example to lighten the space storage of a photography. The direct sampling is then employed to simulate the lowest resolution level, and then to simulate each level, up to the finest resolution, conditioned to the level one rank coarser. This scheme helps reproduce the spatial structures at any scale of the training image and then generate more realistic models. We illustrate the method with aerial photographies (satellite images) and natural textures. Indeed, these kinds of images often display typical structures at different scales and are well-suited for MPS simulation techniques.

  4. Huffman and linear scanning methods with statistical language models.

    PubMed

    Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris

    2015-03-01

    Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.

  5. From entropy-maximization to equality-maximization: Gauss, Laplace, Pareto, and Subbotin

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo

    2014-12-01

    The entropy-maximization paradigm of statistical physics is well known to generate the omnipresent Gauss law. In this paper we establish an analogous socioeconomic model which maximizes social equality, rather than physical disorder, in the context of the distributions of income and wealth in human societies. We show that-on a logarithmic scale-the Laplace law is the socioeconomic equality-maximizing counterpart of the physical entropy-maximizing Gauss law, and that this law manifests an optimized balance between two opposing forces: (i) the rich and powerful, striving to amass ever more wealth, and thus to increase social inequality; and (ii) the masses, struggling to form more egalitarian societies, and thus to increase social equality. Our results lead from log-Gauss statistics to log-Laplace statistics, yield Paretian power-law tails of income and wealth distributions, and show how the emergence of a middle-class depends on the underlying levels of socioeconomic inequality and variability. Also, in the context of asset-prices with Laplace-distributed returns, our results imply that financial markets generate an optimized balance between risk and predictability.

  6. A Statistical Weather-Driven Streamflow Model: Enabling future flow predictions in data-scarce headwater streams

    NASA Astrophysics Data System (ADS)

    Rosner, A.; Letcher, B. H.; Vogel, R. M.

    2014-12-01

    Predicting streamflow in headwaters and over a broad spatial scale pose unique challenges due to limited data availability. Flow observation gages for headwaters streams are less common than for larger rivers, and gages with records lengths of ten year or more are even more scarce. Thus, there is a great need for estimating streamflows in ungaged or sparsely-gaged headwaters. Further, there is often insufficient basin information to develop rainfall-runoff models that could be used to predict future flows under various climate scenarios. Headwaters in the northeastern U.S. are of particular concern to aquatic biologists, as these stream serve as essential habitat for native coldwater fish. In order to understand fish response to past or future environmental drivers, estimates of seasonal streamflow are needed. While there is limited flow data, there is a wealth of data for historic weather conditions. Observed data has been modeled to interpolate a spatially continuous historic weather dataset. (Mauer et al 2002). We present a statistical model developed by pairing streamflow observations with precipitation and temperature information for the same and preceding time-steps. We demonstrate this model's use to predict flow metrics at the seasonal time-step. While not a physical model, this statistical model represents the weather drivers. Since this model can predict flows not directly tied to reference gages, we can generate flow estimates for historic as well as potential future conditions.

  7. Pupil Influence on the Visual Outcomes of a New-Generation Multifocal Toric Intraocular Lens With a Surface-Embedded Near Segment.

    PubMed

    Wang, Mengmeng; Corpuz, Christine Carole C; Huseynova, Tukezban; Tomita, Minoru

    2016-02-01

    To evaluate the influences of preoperative pupil parameters on the visual outcomes of a new-generation multifocal toric intraocular lens (IOL) model with a surface-embedded near segment. In this prospective study, patients with cataract had phacoemulsification and implantation of Lentis Mplus toric LU-313 30TY IOLs (Oculentis GmbH, Berlin, Germany). The visual and optical outcomes were measured and compared preoperatively and postoperatively. The correlations between preoperative pupil parameters (diameter and decentration) and 3-month postoperative visual outcomes were evaluated using the Spearman's rank-order correlation coefficient (Rs) for the nonparametric data. A total of 27 eyes (16 patients) were enrolled into the current study. Statistically significant improvements in visual and refractive performances were found after the implantation of Lentis Mplus toric LU-313 30TY IOLs (P < .05). Statistically significant correlations were present between preoperative pupil diameters and postoperative visual acuities (Rs > 0; P < .05). Patients with a larger pupil always have better postoperative visual acuities. Meanwhile, there was no statistically significant correlation between pupil decentration and visual acuities (P > .05). Lentis Mplus toric LU-313 30TY IOLs provided excellent visual and optical performances during the 3-month follow-up. The preoperative pupil size is an important parameter when this toric multifocal IOL model is contemplated for surgery. Copyright 2016, SLACK Incorporated.

  8. Statistical and dynamical modeling of heavy-ion fusion-fission reactions

    NASA Astrophysics Data System (ADS)

    Eslamizadeh, H.; Razazzadeh, H.

    2018-02-01

    A modified statistical model and a four dimensional dynamical model based on Langevin equations have been used to simulate the fission process of the excited compound nuclei 207At and 216Ra produced in the fusion 19F + 188Os and 19F + 197Au reactions. The evaporation residue cross section, the fission cross section, the pre-scission neutron, proton and alpha multiplicities and the anisotropy of fission fragments angular distribution have been calculated for the excited compound nuclei 207At and 216Ra. In the modified statistical model the effects of spin K about the symmetry axis and temperature have been considered in calculations of the fission widths and the potential energy surfaces. It was shown that the modified statistical model can reproduce the above mentioned experimental data by using appropriate values of the temperature coefficient of the effective potential equal to λ = 0.0180 ± 0.0055, 0.0080 ± 0.0030 MeV-2 and the scaling factor of the fission barrier height equal to rs = 1.0015 ± 0.0025, 1.0040 ± 0.0020 for the compound nuclei 207At and 216Ra, respectively. Three collective shape coordinates plus the projection of total spin of the compound nucleus on the symmetry axis, K, were considered in the four dimensional dynamical model. In the dynamical calculations, dissipation was generated through the chaos weighted wall and window friction formula. Comparison of the theoretical results with the experimental data showed that two models make it possible to reproduce satisfactorily the above mentioned experimental data for the excited compound nuclei 207At and 216Ra.

  9. ACIRF user's guide: Theory and examples

    NASA Astrophysics Data System (ADS)

    Dana, Roger A.

    1989-12-01

    Design and evaluation of radio frequency systems that must operate through ionospheric disturbances resulting from high altitude nuclear detonations requires an accurate channel model. This model must include the effects of high gain antennas that may be used to receive the signals. Such a model can then be used to construct realizations of the received signal for use in digital simulations of trans-ionospheric links or for use in hardware channel simulators. The FORTRAN channel model ACIRF (Antenna Channel Impulse Response Function) generates random realizations of the impulse response function at the outputs of multiple antennas. This user's guide describes the FORTRAN program ACIRF (version 2.0) that generates realizations of channel impulse response functions at the outputs of multiple antennas with arbitrary beamwidths, pointing angles, and relatives positions. This channel model is valid under strong scattering conditions when Rayleigh fading statistics apply. Both frozen-in and turbulent models for the temporal fluctuations are included in this version of ACIRF. The theory of the channel model is described and several examples are given.

  10. Quarks, Symmetries and Strings - a Symposium in Honor of Bunji Sakita's 60th Birthday

    NASA Astrophysics Data System (ADS)

    Kaku, M.; Jevicki, A.; Kikkawa, K.

    1991-04-01

    The Table of Contents for the full book PDF is as follows: * Preface * Evening Banquet Speech * I. Quarks and Phenomenology * From the SU(6) Model to Uniqueness in the Standard Model * A Model for Higgs Mechanism in the Standard Model * Quark Mass Generation in QCD * Neutrino Masses in the Standard Model * Solar Neutrino Puzzle, Horizontal Symmetry of Electroweak Interactions and Fermion Mass Hierarchies * State of Chiral Symmetry Breaking at High Temperatures * Approximate |ΔI| = 1/2 Rule from a Perspective of Light-Cone Frame Physics * Positronium (and Some Other Systems) in a Strong Magnetic Field * Bosonic Technicolor and the Flavor Problem * II. Strings * Supersymmetry in String Theory * Collective Field Theory and Schwinger-Dyson Equations in Matrix Models * Non-Perturbative String Theory * The Structure of Non-Perturbative Quantum Gravity in One and Two Dimensions * Noncritical Virasoro Algebra of d < 1 Matrix Model and Quantized String Field * Chaos in Matrix Models ? * On the Non-Commutative Symmetry of Quantum Gravity in Two Dimensions * Matrix Model Formulation of String Field Theory in One Dimension * Geometry of the N = 2 String Theory * Modular Invariance form Gauge Invariance in the Non-Polynomial String Field Theory * Stringy Symmetry and Off-Shell Ward Identities * q-Virasoro Algebra and q-Strings * Self-Tuning Fields and Resonant Correlations in 2d-Gravity * III. Field Theory Methods * Linear Momentum and Angular Momentum in Quaternionic Quantum Mechanics * Some Comments on Real Clifford Algebras * On the Quantum Group p-adics Connection * Gravitational Instantons Revisited * A Generalized BBGKY Hierarchy from the Classical Path-Integral * A Quantum Generated Symmetry: Group-Level Duality in Conformal and Topological Field Theory * Gauge Symmetries in Extended Objects * Hidden BRST Symmetry and Collective Coordinates * Towards Stochastically Quantizing Topological Actions * IV. Statistical Methods * A Brief Summary of the s-Channel Theory of Superconductivity * Neural Networks and Models for the Brain * Relativistic One-Body Equations for Planar Particles with Arbitrary Spin * Chiral Property of Quarks and Hadron Spectrum in Lattice QCD * Scalar Lattice QCD * Semi-Superconductivity of a Charged Anyon Gas * Two-Fermion Theory of Strongly Correlated Electrons and Charge-Spin Separation * Statistical Mechanics and Error-Correcting Codes * Quantum Statistics

  11. Use of LANDSAT 8 images for depth and water quality assessment of El Guájaro reservoir, Colombia

    NASA Astrophysics Data System (ADS)

    González-Márquez, Luis Carlos; Torres-Bejarano, Franklin M.; Torregroza-Espinosa, Ana Carolina; Hansen-Rodríguez, Ivette Renée; Rodríguez-Gallegos, Hugo B.

    2018-03-01

    The aim of this study was to evaluate the viability of using Landsat 8 spectral images to estimate water quality parameters and depth in El Guájaro Reservoir. On February and March 2015, two samplings were carried out in the reservoir, coinciding with the Landsat 8 images. Turbidity, dissolved oxygen, electrical conductivity, pH and depth were evaluated. Through multiple regression analysis between measured water quality parameters and the reflectance of the pixels corresponding to the sampling stations, statistical models with determination coefficients between 0.6249 and 0.9300 were generated. Results indicate that from a small number of measured parameters we can generate reliable models to estimate the spatial variation of turbidity, dissolved oxygen, pH and depth, as well the temporal variation of electrical conductivity, so models generated from Landsat 8 can be used as a tool to facilitate the environmental, economic and social management of the reservoir.

  12. Hyperspectral Mapping of the Invasive Species Pepperweed and the Development of a Habitat Suitability Model

    NASA Technical Reports Server (NTRS)

    Nguyen, Andrew; Gole, Alexander; Randall, Jarom; Dlott, Glade; Zhang, Sylvia; Alfaro, Brian; Schmidt, Cindy; Skiles, J. W.

    2011-01-01

    Mapping and predicting the spatial distribution of invasive plant species is central to habitat management, however difficult to implement at landscape and regional scales. Remote sensing techniques can reduce the impact field campaigns have on these ecologically sensitive areas and can provide a regional and multi-temporal view of invasive species spread. Invasive perennial pepperweed (Lepidium latifolium) is now widespread in fragmented estuaries of the South San Francisco Bay, and is shown to degrade native vegetation in estuaries and adjacent habitats, thereby reducing forage and shelter for wildlife. The purpose of this study is to map the present distribution of pepperweed in estuarine areas of the South San Francisco Bay Salt Pond Restoration Project (Alviso, CA), and create a habitat suitability model to predict future spread. Pepperweed reflectance data were collected in-situ with a GER 1500 spectroradiometer along with 88 corresponding pepperweed presence and absence points used for building the statistical models. The spectral angle mapper (SAM) classification algorithm was used to distinguish the reflectance spectrum of pepperweed and map its distribution using an image from EO-1 Hyperion. To map pepperweed, we performed a supervised classification on an ASTER image with a resulting classification accuracy of 71.8%. We generated a weighted overlay analysis model within a geographic information system (GIS) framework to predict areas in the study site most susceptible to pepperweed colonization. Variables for the model included propensity for disturbance, status of pond restoration, proximity to water channels, and terrain curvature. A Generalized Additive Model (GAM) was also used to generate a probability map and investigate the statistical probability that each variable contributed to predict pepperweed spread. Results from the GAM revealed distance to channels, distance to ponds and curvature were statistically significant (p < 0.01) in determining the locations of suitable pepperweed habitats.

  13. Capturing spatial and temporal patterns of widespread, extreme flooding across Europe

    NASA Astrophysics Data System (ADS)

    Busby, Kathryn; Raven, Emma; Liu, Ye

    2013-04-01

    Statistical characterisation of physical hazards is an integral part of probabilistic catastrophe models used by the reinsurance industry to estimate losses from large scale events. Extreme flood events are not restricted by country boundaries which poses an issue for reinsurance companies as their exposures often extend beyond them. We discuss challenges and solutions that allow us to appropriately capture the spatial and temporal dependence of extreme hydrological events on a continental-scale, which in turn enables us to generate an industry-standard stochastic event set for estimating financial losses for widespread flooding. By presenting our event set methodology, we focus on explaining how extreme value theory (EVT) and dependence modelling are used to account for short, inconsistent hydrological data from different countries, and how to make appropriate statistical decisions that best characterise the nature of flooding across Europe. The consistency of input data is of vital importance when identifying historical flood patterns. Collating data from numerous sources inherently causes inconsistencies and we demonstrate our robust approach to assessing the data and refining it to compile a single consistent dataset. This dataset is then extrapolated using a parameterised EVT distribution to estimate extremes. Our method then captures the dependence of flood events across countries using an advanced multivariate extreme value model. Throughout, important statistical decisions are explored including: (1) distribution choice; (2) the threshold to apply for extracting extreme data points; (3) a regional analysis; (4) the definition of a flood event, which is often linked with reinsurance industry's hour's clause; and (5) handling of missing values. Finally, having modelled the historical patterns of flooding across Europe, we sample from this model to generate our stochastic event set comprising of thousands of events over thousands of years. We then briefly illustrate how this is applied within a probabilistic model to estimate catastrophic loss curves used by the reinsurance industry.

  14. Accessing and constructing driving data to develop fuel consumption forecast model

    NASA Astrophysics Data System (ADS)

    Yamashita, Rei-Jo; Yao, Hsiu-Hsen; Hung, Shih-Wei; Hackman, Acquah

    2018-02-01

    In this study, we develop a forecasting models, to estimate fuel consumption based on the driving behavior, in which vehicles and routes are known. First, the driving data are collected via telematics and OBDII. Then, the driving fuel consumption formula is used to calculate the estimate fuel consumption, and driving behavior indicators are generated for analysis. Based on statistical analysis method, the driving fuel consumption forecasting model is constructed. Some field experiment results were done in this study to generate hundreds of driving behavior indicators. Based on data mining approach, the Pearson coefficient correlation analysis is used to filter highly fuel consumption related DBIs. Only highly correlated DBI will be used in the model. These DBIs are divided into four classes: speed class, acceleration class, Left/Right/U-turn class and the other category. We then use K-means cluster analysis to group to the driver class and the route class. Finally, more than 12 aggregate models are generated by those highly correlated DBIs, using the neural network model and regression analysis. Based on Mean Absolute Percentage Error (MAPE) to evaluate from the developed AMs. The best MAPE values among these AM is below 5%.

  15. Generation of Fine Scale Wind and Wave Climatologies

    NASA Astrophysics Data System (ADS)

    Vandenberghe, F. C.; Filipot, J.; Mouche, A.

    2013-12-01

    A tool to generate 'on demand' large databases of atmospheric parameters at high resolution has been developed for defense applications. The approach takes advantage of the zooming and relocation capabilities of the embedded domains that can be found in regional models like the community Weather Research and Forecast model (WRF). The WRF model is applied to dynamically downscale NNRP, CFSR and ERA40 global analyses and to generate long records, up to 30 years, of hourly gridded data over 200km2 domains at 3km grid increment. To insure accuracy, observational data from the NCAR ADP historical database are used in combination with the Four-Dimensional Data Assimilation (FDDA) techniques to constantly nudge the model analysis toward observations. The atmospheric model is coupled to secondary applications such as the NOAA's Wave Watch III model the Navy's APM Electromagnetic Propagation model, allowing the creation of high-resolution climatologies of surface winds, waves and electromagnetic propagation parameters. The system was applied at several coastal locations of the Mediterranean Sea where SAR wind and wave observations were available during the entire year of 2008. Statistical comparisons between the model output and SAR observations are presented. Issues related to the global input data, and the model drift, as well as the impact of the wind biases on wave simulations will be discussed.

  16. TEAMS Model Analyzer

    NASA Technical Reports Server (NTRS)

    Tijidjian, Raffi P.

    2010-01-01

    The TEAMS model analyzer is a supporting tool developed to work with models created with TEAMS (Testability, Engineering, and Maintenance System), which was developed by QSI. In an effort to reduce the time spent in the manual process that each TEAMS modeler must perform in the preparation of reporting for model reviews, a new tool has been developed as an aid to models developed in TEAMS. The software allows for the viewing, reporting, and checking of TEAMS models that are checked into the TEAMS model database. The software allows the user to selectively model in a hierarchical tree outline view that displays the components, failure modes, and ports. The reporting features allow the user to quickly gather statistics about the model, and generate an input/output report pertaining to all of the components. Rules can be automatically validated against the model, with a report generated containing resulting inconsistencies. In addition to reducing manual effort, this software also provides an automated process framework for the Verification and Validation (V&V) effort that will follow development of these models. The aid of such an automated tool would have a significant impact on the V&V process.

  17. SPAGETTA: a Multi-Purpose Gridded Stochastic Weather Generator

    NASA Astrophysics Data System (ADS)

    Dubrovsky, M.; Huth, R.; Rotach, M. W.; Dabhi, H.

    2017-12-01

    SPAGETTA is a new multisite/gridded multivariate parametric stochastic weather generator (WG). Site-specific precipitation occurrence and amount are modelled by Markov chain and Gamma distribution, the non-precipitation variables are modelled by an autoregressive (AR) model conditioned on precipitation occurrence, and the spatial coherence of all variables is modelled following the Wilks' (2009) approach. SPAGETTA may be run in two modes. Mode 1: it is run as a classical WG, which is calibrated using weather series from multiple sites, and only then it may produce arbitrarily long synthetic series mimicking the spatial and temporal structure of the calibration data. To generate the weather series representing the future climate, the WG parameters are modified according to the climate change scenario, typically derived from GCM or RCM simulations. Mode 2: the user provides only basic information (not necessarily to be realistic) on the temporal and spatial auto-correlation structure of the weather variables and their mean annual cycle; the generator itself derives the parameters of the underlying AR model, which produces the multi-site weather series. Optionally, the user may add the spatially varying trend, which is superimposed to the synthetic series. The contribution consists of following parts: (a) Model of the WG. (b) Validation of WG in terms of the spatial temperature and precipitation characteristics, including characteristics of spatial hot/cold/dry/wet spells. (c) Results of the climate change impact experiment, in which the WG parameters representing the spatial and temporal variability are modified using the climate change scenarios and the effect on the above spatial validation indices is analysed. In this experiment, the WG is calibrated using the E-OBS gridded daily weather data for several European regions, and the climate change scenarios are derived from the selected RCM simulations (CORDEX database). (d) The second mode of operation will be demonstrated by results obtained while developing the methodology for assessing collective significance of trends in multi-site weather series. The performance of the proposed test statistics is assessed based on large number of realisations of synthetic series produced by WG assuming a given statistical structure and trend of the weather series.

  18. Storm surge and tide interaction: a complete paradigm

    NASA Astrophysics Data System (ADS)

    Horsburgh, Kevin; Williams, Jane; Proctor, Robert

    2014-05-01

    Globally, 200 million people live on coastal floodplains and about 1 trillion worth of assets lie within 1 metre of mean sea level. Any change in the statistics of flood frequency or severity would impact on economic and social systems. It is therefore crucial to understand the physical drivers of extreme storm surges, and to have confidence in datasets used for extreme sea level statistics. Much previous research has focussed on the process of tide-surge interaction, and it is now widely accepted that the physical basis of tide-surge interaction is that a phase shift of the tidal signal represents the effect of the surge on the tide. The second aspect of interaction is that shallow water momentum considerations imply that differing tidal states should modulate surge generation: wind stress should have greater surge-generating potential on lower tides. This has been shown previously by analytical models but not as yet confirmed by fully non-linear models of the continental shelf. We present results from an operational model of the European shelf that demonstrate that tidal range does have an effect on the surges generated. The cycle-integrated effects of wind stress (i.e. the skew surge) are generally greater when tidal range is low. Our results contradict the absence of any such correlation observed in the complete record of UK tide gauge data. This suggests that whilst the modulating effect of the tide on the skew surge (the time-independent difference between peak prediction and observations) is significant, the difference between individual storms is dominant. This implies that forecasting systems must predict salient detail of the most intense storms. A further implication is that operational models need to simulate tides with acceptable accuracy at all coastal locations. We extend our model analysis to show that the same modulation of storm surges (by tidal conditions) applies to tropical cyclones. We conduct simulations using a mature operational storm surge model in the Bay of Bengal with tropical cyclones from the IBTrACs database; we demonstrate that - just as with the extra-tropical case - higher storm surges on the Bangladesh coastline are generated during smaller tides.

  19. A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding.

    PubMed

    Subotin, Michael; Davis, Anthony R

    2016-09-01

    Natural language processing methods for medical auto-coding, or automatic generation of medical billing codes from electronic health records, generally assign each code independently of the others. They may thus assign codes for closely related procedures or diagnoses to the same document, even when they do not tend to occur together in practice, simply because the right choice can be difficult to infer from the clinical narrative. We propose a method that injects awareness of the propensities for code co-occurrence into this process. First, a model is trained to estimate the conditional probability that one code is assigned by a human coder, given than another code is known to have been assigned to the same document. Then, at runtime, an iterative algorithm is used to apply this model to the output of an existing statistical auto-coder to modify the confidence scores of the codes. We tested this method in combination with a primary auto-coder for International Statistical Classification of Diseases-10 procedure codes, achieving a 12% relative improvement in F-score over the primary auto-coder baseline. The proposed method can be used, with appropriate features, in combination with any auto-coder that generates codes with different levels of confidence. The promising results obtained for International Statistical Classification of Diseases-10 procedure codes suggest that the proposed method may have wider applications in auto-coding. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. [Design and implementation of online statistical analysis function in information system of air pollution and health impact monitoring].

    PubMed

    Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun

    2018-01-01

    To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.

  1. Improving the Canadian Precipitation Analysis Estimates through an Observing System Simulation Experiment

    NASA Astrophysics Data System (ADS)

    Abbasnezhadi, K.; Rasmussen, P. F.; Stadnyk, T.

    2014-12-01

    To gain a better understanding of the spatiotemporal distribution of rainfall over the Churchill River basin, this study was undertaken. The research incorporates gridded precipitation data from the Canadian Precipitation Analysis (CaPA) system. CaPA has been developed by Environment Canada and provides near real-time precipitation estimates on a 10 km by 10 km grid over North America at a temporal resolution of 6 hours. The spatial fields are generated by combining forecasts from the Global Environmental Multiscale (GEM) model with precipitation observations from the network of synoptic weather stations. CaPA's skill is highly influenced by the number of weather stations in the region of interest as well as by the quality of the observations. In an attempt to evaluate the performance of CaPA as a function of the density of the weather station network, a dual-stage design algorithm to simulate CaPA is proposed which incorporates generated weather fields. More specifically, we are adopting a controlled design algorithm which is generally known as Observing System Simulation Experiment (OSSE). The advantage of using the experiment is that one can define reference precipitation fields assumed to represent the true state of rainfall over the region of interest. In the first stage of the defined OSSE, a coupled stochastic model of precipitation and temperature gridded fields is calibrated and validated. The performance of the generator is then validated by comparing model statistics with observed statistics and by using the generated samples as input to the WATFLOOD™ hydrologic model. In the second stage of the experiment, in order to account for the systematic error of station observations and GEM fields, representative errors are to be added to the reference field using by-products of CaPA's variographic analysis. These by-products explain the variance of station observations and background errors.

  2. Template-free modeling by LEE and LEER in CASP11.

    PubMed

    Joung, InSuk; Lee, Sun Young; Cheng, Qianyi; Kim, Jong Yun; Joo, Keehyoung; Lee, Sung Jong; Lee, Jooyoung

    2016-09-01

    For the template-free modeling of human targets of CASP11, we utilized two of our modeling protocols, LEE and LEER. The LEE protocol took CASP11-released server models as the input and used some of them as templates for 3D (three-dimensional) modeling. The template selection procedure was based on the clustering of the server models aided by a community detection method of a server-model network. Restraining energy terms generated from the selected templates together with physical and statistical energy terms were used to build 3D models. Side-chains of the 3D models were rebuilt using target-specific consensus side-chain library along with the SCWRL4 rotamer library, which completed the LEE protocol. The first success factor of the LEE protocol was due to efficient server model screening. The average backbone accuracy of selected server models was similar to that of top 30% server models. The second factor was that a proper energy function along with our optimization method guided us, so that we successfully generated better quality models than the input template models. In 10 out of 24 cases, better backbone structures than the best of input template structures were generated. LEE models were further refined by performing restrained molecular dynamics simulations to generate LEER models. CASP11 results indicate that LEE models were better than the average template models in terms of both backbone structures and side-chain orientations. LEER models were of improved physical realism and stereo-chemistry compared to LEE models, and they were comparable to LEE models in the backbone accuracy. Proteins 2016; 84(Suppl 1):118-130. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  3. Statistical auditing and randomness test of lotto k/N-type games

    NASA Astrophysics Data System (ADS)

    Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.

    2008-11-01

    One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.

  4. Self-organized network of fractal-shaped components coupled through statistical interaction.

    PubMed

    Ugajin, R

    2001-09-01

    A dissipative dynamics is introduced to generate self-organized networks of interacting objects, which we call coupled-fractal networks. The growth model is constructed based on a growth hypothesis in which the growth rate of each object is a product of the probability of receiving source materials from faraway and the probability of receiving adhesives from other grown objects, where each object grows to be a random fractal if isolated, but connects with others if glued. The network is governed by the statistical interaction between fractal-shaped components, which can only be identified in a statistical manner over ensembles. This interaction is investigated using the degree of correlation between fractal-shaped components, enabling us to determine whether it is attractive or repulsive.

  5. A geomorphic approach to 100-year floodplain mapping for the Conterminous United States

    NASA Astrophysics Data System (ADS)

    Jafarzadegan, Keighobad; Merwade, Venkatesh; Saksena, Siddharth

    2018-06-01

    Floodplain mapping using hydrodynamic models is difficult in data scarce regions. Additionally, using hydrodynamic models to map floodplain over large stream network can be computationally challenging. Some of these limitations of floodplain mapping using hydrodynamic modeling can be overcome by developing computationally efficient statistical methods to identify floodplains in large and ungauged watersheds using publicly available data. This paper proposes a geomorphic model to generate probabilistic 100-year floodplain maps for the Conterminous United States (CONUS). The proposed model first categorizes the watersheds in the CONUS into three classes based on the height of the water surface corresponding to the 100-year flood from the streambed. Next, the probability that any watershed in the CONUS belongs to one of these three classes is computed through supervised classification using watershed characteristics related to topography, hydrography, land use and climate. The result of this classification is then fed into a probabilistic threshold binary classifier (PTBC) to generate the probabilistic 100-year floodplain maps. The supervised classification algorithm is trained by using the 100-year Flood Insurance Rated Maps (FIRM) from the U.S. Federal Emergency Management Agency (FEMA). FEMA FIRMs are also used to validate the performance of the proposed model in areas not included in the training. Additionally, HEC-RAS model generated flood inundation extents are used to validate the model performance at fifteen sites that lack FEMA maps. Validation results show that the probabilistic 100-year floodplain maps, generated by proposed model, match well with both FEMA and HEC-RAS generated maps. On average, the error of predicted flood extents is around 14% across the CONUS. The high accuracy of the validation results shows the reliability of the geomorphic model as an alternative approach for fast and cost effective delineation of 100-year floodplains for the CONUS.

  6. Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

    NASA Astrophysics Data System (ADS)

    Kabir, A.; Barker, J.; Giurgiu, M.

    2010-09-01

    An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.

  7. Preprocessor and postprocessor computer programs for a radial-flow finite-element model

    USGS Publications Warehouse

    Pucci, A.A.; Pope, D.A.

    1987-01-01

    Preprocessing and postprocessing computer programs that enhance the utility of the U.S. Geological Survey radial-flow model have been developed. The preprocessor program: (1) generates a triangular finite element mesh from minimal data input, (2) produces graphical displays and tabulations of data for the mesh , and (3) prepares an input data file to use with the radial-flow model. The postprocessor program is a version of the radial-flow model, which was modified to (1) produce graphical output for simulation and field results, (2) generate a statistic for comparing the simulation results with observed data, and (3) allow hydrologic properties to vary in the simulated region. Examples of the use of the processor programs for a hypothetical aquifer test are presented. Instructions for the data files, format instructions, and a listing of the preprocessor and postprocessor source codes are given in the appendixes. (Author 's abstract)

  8. The development of comparative bias index

    NASA Astrophysics Data System (ADS)

    Aimran, Ahmad Nazim; Ahmad, Sabri; Afthanorhan, Asyraf; Awang, Zainudin

    2017-08-01

    Structural Equation Modeling (SEM) is a second generation statistical analysis techniques developed for analyzing the inter-relationships among multiple variables in a model simultaneously. There are two most common used methods in SEM namely Covariance-Based Structural Equation Modeling (CB-SEM) and Partial Least Square Path Modeling (PLS-PM). There have been continuous debates among researchers in the use of PLS-PM over CB-SEM. While there is few studies were conducted to test the performance of CB-SEM and PLS-PM bias in estimating simulation data. This study intends to patch this problem by a) developing the Comparative Bias Index and b) testing the performance of CB-SEM and PLS-PM using developed index. Based on balanced experimental design, two multivariate normal simulation data with of distinct specifications of size 50, 100, 200 and 500 are generated and analyzed using CB-SEM and PLS-PM.

  9. Macroscopic and bulk-controlled elastic modes in an interaction of interstitial alcali metal cations within a face-centered cubic crystalline fullerine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tatarenko, V.A.; Tsysman, C.L.; Oltarzhevskaya, Y.T.

    1994-12-31

    The calculations in a majority of previous works for the fulleride (AqC{sub 60}) crystals were performed within the framework of the rigid-lattice model, neglecting the distoration relaxation of the host fullerene (C{sub 60}) crystal caused by the interstitial alkali-metal (A) cations. However, an each cation is a source of a static distoration field, and the resulting field is a superposition of such fields generated by all cations. This is a reason why the host-crystal distortions depend on the A-cations configurations, i.e. on a type of a spatial bulk distribution of interstitial cations. This paper seeks to find a functional relationmore » between the amplitudes of the doping-induced structure-distortion waves and of statistic concentration ones. A semiphenomenological model is constructed here within the scope of statistical-thermodynamic treatment and using the lattice-statistics simulation method. In this model the effects due to the presence of q solute A cations over available interstices (per unit cell) on the statistic inherent reorientation and/or displacements of the solvent molecules from the average-lattice sites as well as on the lattice parameter a of the elastically-anysotropic cubic C{sub 60} crystal are taken into account.« less

  10. Development of a Probabilistic Dynamic Synthesis Method for the Analysis of Nondeterministic Structures

    NASA Technical Reports Server (NTRS)

    Brown, A. M.

    1998-01-01

    Accounting for the statistical geometric and material variability of structures in analysis has been a topic of considerable research for the last 30 years. The determination of quantifiable measures of statistical probability of a desired response variable, such as natural frequency, maximum displacement, or stress, to replace experience-based "safety factors" has been a primary goal of these studies. There are, however, several problems associated with their satisfactory application to realistic structures, such as bladed disks in turbomachinery. These include the accurate definition of the input random variables (rv's), the large size of the finite element models frequently used to simulate these structures, which makes even a single deterministic analysis expensive, and accurate generation of the cumulative distribution function (CDF) necessary to obtain the probability of the desired response variables. The research presented here applies a methodology called probabilistic dynamic synthesis (PDS) to solve these problems. The PDS method uses dynamic characteristics of substructures measured from modal test as the input rv's, rather than "primitive" rv's such as material or geometric uncertainties. These dynamic characteristics, which are the free-free eigenvalues, eigenvectors, and residual flexibility (RF), are readily measured and for many substructures, a reasonable sample set of these measurements can be obtained. The statistics for these rv's accurately account for the entire random character of the substructure. Using the RF method of component mode synthesis, these dynamic characteristics are used to generate reduced-size sample models of the substructures, which are then coupled to form system models. These sample models are used to obtain the CDF of the response variable by either applying Monte Carlo simulation or by generating data points for use in the response surface reliability method, which can perform the probabilistic analysis with an order of magnitude less computational effort. Both free- and forced-response analyses have been performed, and the results indicate that, while there is considerable room for improvement, the method produces usable and more representative solutions for the design of realistic structures with a substantial savings in computer time.

  11. Simulation of California's Major Reservoirs Outflow Using Data Mining Technique

    NASA Astrophysics Data System (ADS)

    Yang, T.; Gao, X.; Sorooshian, S.

    2014-12-01

    The reservoir's outflow is controlled by reservoir operators, which is different from the upstream inflow. The outflow is more important than the reservoir's inflow for the downstream water users. In order to simulate the complicated reservoir operation and extract the outflow decision making patterns for California's 12 major reservoirs, we build a data-driven, computer-based ("artificial intelligent") reservoir decision making tool, using decision regression and classification tree approach. This is a well-developed statistical and graphical modeling methodology in the field of data mining. A shuffled cross validation approach is also employed to extract the outflow decision making patterns and rules based on the selected decision variables (inflow amount, precipitation, timing, water type year etc.). To show the accuracy of the model, a verification study is carried out comparing the model-generated outflow decisions ("artificial intelligent" decisions) with that made by reservoir operators (human decisions). The simulation results show that the machine-generated outflow decisions are very similar to the real reservoir operators' decisions. This conclusion is based on statistical evaluations using the Nash-Sutcliffe test. The proposed model is able to detect the most influential variables and their weights when the reservoir operators make an outflow decision. While the proposed approach was firstly applied and tested on California's 12 major reservoirs, the method is universally adaptable to other reservoir systems.

  12. Enhanced Deformation of Azobenzene-Modified Liquid Crystal Polymers under Dual Wavelength Exposure: A Photophysical Model

    NASA Astrophysics Data System (ADS)

    Liu, Ling; Onck, Patrick R.

    2017-08-01

    Azobenzene-embedded liquid crystal polymers can undergo mechanical deformation in response to ultraviolet (UV) light. The natural rodlike trans state azobenzene absorbs UV light and isomerizes to a bentlike cis state, which disturbs the order of the polymer network, leading to an anisotropic deformation. The current consensus is that the magnitude of the photoinduced deformation is related to the statistical building up of molecules in the cis state. However, a recent experimental study [Liu and Broer, Nat. Commun. 6 8334 (2015)., 10.1038/ncomms9334] shows that a drastic (fourfold) increase of the photoinduced deformation can be generated by exposing the samples simultaneously to 365 nm (UV) and 455 nm (visible) light. To elucidate the physical mechanism that drives this increase, we develop a two-light attenuation model and an optomechanical constitutive relation that not only accounts for the statistical accumulation of cis azobenzenes, but also for the dynamic trans-cis-trans oscillatory isomerization process. Our experimentally calibrated model predicts that the optimal single-wavelength exposure is 395 nm light, a pronounced shift towards the visible spectrum. In addition, we identify a range of optimal combinations of two-wavelength lights that generate a favorable response for a given amount of injected energy. Our model provides mechanistic insight into the different (multi)wavelength exposures used in experiments and, at the same time, opens new avenues towards enhanced, multiwavelength optomechanical behavior.

  13. Cascaded Amplitude Modulations in Sound Texture Perception

    PubMed Central

    McWalter, Richard; Dau, Torsten

    2017-01-01

    Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as “beating” in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures—stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches. PMID:28955191

  14. Cascaded Amplitude Modulations in Sound Texture Perception.

    PubMed

    McWalter, Richard; Dau, Torsten

    2017-01-01

    Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as "beating" in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures-stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches.

  15. Statistics of Magnetic Reconnection X-Lines in Kinetic Turbulence

    NASA Astrophysics Data System (ADS)

    Haggerty, C. C.; Parashar, T.; Matthaeus, W. H.; Shay, M. A.; Wan, M.; Servidio, S.; Wu, P.

    2016-12-01

    In this work we examine the statistics of magnetic reconnection (x-lines) and their associated reconnection rates in intermittent current sheets generated in turbulent plasmas. Although such statistics have been studied previously for fluid simulations (e.g. [1]), they have not yet been generalized to fully kinetic particle-in-cell (PIC) simulations. A significant problem with PIC simulations, however, is electrostatic fluctuations generated due to numerical particle counting statistics. We find that analyzing gradients of the magnetic vector potential from the raw PIC field data identifies numerous artificial (or non-physical) x-points. Using small Orszag-Tang vortex PIC simulations, we analyze x-line identification and show that these artificial x-lines can be removed using sub-Debye length filtering of the data. We examine how turbulent properties such as the magnetic spectrum and scale dependent kurtosis are affected by particle noise and sub-Debye length filtering. We subsequently apply these analysis methods to a large scale kinetic PIC turbulent simulation. Consistent with previous fluid models, we find a range of normalized reconnection rates as large as ½ but with the bulk of the rates being approximately less than to 0.1. [1] Servidio, S., W. H. Matthaeus, M. A. Shay, P. A. Cassak, and P. Dmitruk (2009), Magnetic reconnection and two-dimensional magnetohydrodynamic turbulence, Phys. Rev. Lett., 102, 115003.

  16. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  17. Optimization of the Heat Exchangers of a Thermoelectric Generation System

    NASA Astrophysics Data System (ADS)

    Martínez, A.; Vián, J. G.; Astrain, D.; Rodríguez, A.; Berrio, I.

    2010-09-01

    The thermal resistances of the heat exchangers have a strong influence on the electric power produced by a thermoelectric generator. In this work, the heat exchangers of a thermoelectric generator have been optimized in order to maximize the electric power generated. This thermoelectric generator harnesses heat from the exhaust gas of a domestic gas boiler. Statistical design of experiments was used to assess the influence of five factors on both the electric power generated and the pressure drop in the chimney: height of the generator, number of modules per meter of generator height, length of the fins of the hot-side heat exchanger (HSHE), length of the gap between fins of the HSHE, and base thickness of the HSHE. The electric power has been calculated using a computational model, whereas Fluent computational fluid dynamics (CFD) has been used to obtain the thermal resistances of the heat exchangers and the pressure drop. Finally, the thermoelectric generator has been optimized, taking into account the restrictions on the pressure drop.

  18. Factors determining waste generation in Spanish towns and cities.

    PubMed

    Prades, Miriam; Gallardo, Antonio; Ibàñez, Maria Victoria

    2015-01-01

    This paper analyzes the generation and composition of municipal solid waste in Spanish towns and cities with more than 5000 inhabitants, which altogether account for 87% of the Spanish population. To do so, the total composition and generation of municipal solid waste fractions were obtained from 135 towns and cities. Homogeneity tests revealed heterogeneity in the proportions of municipal solid waste fractions from one city to another. Statistical analyses identified significant differences in the generation of glass in cities of different sizes and in the generation of all fractions depending on the hydrographic area. Finally, linear regression models and residuals analysis were applied to analyze the effect of different demographic, geographic, and socioeconomic variables on the generation of waste fractions. The conclusions show that more densely populated towns, a hydrographic area, and cities with over 50,000 inhabitants have higher waste generation rates, while certain socioeconomic variables (people/car) decrease that generation. Other socioeconomic variables (foreigners and unemployment) show a positive and null influence on that waste generation, respectively.

  19. Assimilation of altimeter data into a quasigeostrophic ocean model using optimal interpolation and eofs

    NASA Astrophysics Data System (ADS)

    Rienecker, M. M.; Adamec, D.

    1995-01-01

    An ensemble of fraternal-twin experiments is used to assess the utility of optimal interpolation and model-based vertical empirical orthogonal functions (eofs) of streamfunction variability to assimilate satellite altimeter data into ocean models. Simulated altimeter data are assimilated into a basin-wide 3-layer quasi-geostrophic model with a horizontal grid spacing of 15 km. The effects of bottom topography are included and the model is forced by a wind stress curl distribution which is constant in time. The simulated data are extracted, along altimeter tracks with spatial and temporal characteristics of Geosat, from a reference model ocean with a slightly different climatology from that generated by the model used for assimilation. The use of vertical eofs determined from the model-generated streamfunction variability is shown to be effective in aiding the model's dynamical extrapolation of the surface information throughout the rest of the water column. After a single repeat cycle (17 days), the analysis errors are reduced markedly from the initial level, by 52% in the surface layer, 41% in the second layer and 11% in the bottom layer. The largest differences between the assimilation analysis and the reference ocean are found in the nonlinear regime of the mid-latitude jet in all layers. After 100 days of assimilation, the error in the upper two layers has been reduced by over 50% and that in the bottom layer by 38%. The essence of the method is that the eofs capture the statistics of the dynamical balances in the model and ensure that this balance is not inappropriately disturbed during the assimilation process. This statistical balance includes any potential vorticity homogeneity which may be associated with the eddy stirring by mid-latitude surface jets.

  20. Single photon counting linear mode avalanche photodiode technologies

    NASA Astrophysics Data System (ADS)

    Williams, George M.; Huntington, Andrew S.

    2011-10-01

    The false count rate of a single-photon-sensitive photoreceiver consisting of a high-gain, low-excess-noise linear-mode InGaAs avalanche photodiode (APD) and a high-bandwidth transimpedance amplifier (TIA) is fit to a statistical model. The peak height distribution of the APD's multiplied dark current is approximated by the weighted sum of McIntyre distributions, each characterizing dark current generated at a different location within the APD's junction. The peak height distribution approximated in this way is convolved with a Gaussian distribution representing the input-referred noise of the TIA to generate the statistical distribution of the uncorrelated sum. The cumulative distribution function (CDF) representing count probability as a function of detection threshold is computed, and the CDF model fit to empirical false count data. It is found that only k=0 McIntyre distributions fit the empirically measured CDF at high detection threshold, and that false count rate drops faster than photon count rate as detection threshold is raised. Once fit to empirical false count data, the model predicts the improvement of the false count rate to be expected from reductions in TIA noise and APD dark current. Improvement by at least three orders of magnitude is thought feasible with further manufacturing development and a capacitive-feedback TIA (CTIA).

Top