Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pasqualini, Donatella
This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimatedmore » stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.« less
Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity
NASA Astrophysics Data System (ADS)
Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.
As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.
TinkerPlots™ Model Construction Approaches for Comparing Two Groups: Student Perspectives
ERIC Educational Resources Information Center
Noll, Jennifer; Kirin, Dana
2017-01-01
Teaching introductory statistics using curricula focused on modeling and simulation is becoming increasingly common in introductory statistics courses and touted as a more beneficial approach for fostering students' statistical thinking. Yet, surprisingly little research has been conducted to study the impact of modeling and simulation curricula…
ERIC Educational Resources Information Center
Braham, Hana Manor; Ben-Zvi, Dani
2017-01-01
A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
Use of statistical and neural net approaches in predicting toxicity of chemicals.
Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D
2000-01-01
Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines
NASA Astrophysics Data System (ADS)
Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.
2016-12-01
Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
Different Manhattan project: automatic statistical model generation
NASA Astrophysics Data System (ADS)
Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore
2002-03-01
We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.
Garcia, Luís Filipe; de Oliveira, Luís Caldas; de Matos, David Martins
2016-01-01
This study compared the performance of two statistical location-aware pictogram prediction mechanisms, with an all-purpose (All) pictogram prediction mechanism, having no location knowledge. The All approach had a unique language model under all locations. One of the location-aware alternatives, the location-specific (Spec) approach, made use of specific language models for pictogram prediction in each location of interest. The other location-aware approach resulted from combining the Spec and the All approaches, and was designated the mixed approach (Mix). In this approach, the language models acquired knowledge from all locations, but a higher relevance was assigned to the vocabulary from the associated location. Results from simulations showed that the Mix and Spec approaches could only outperform the baseline in a statistically significant way if pictogram users reuse more than 50% and 75% of their sentences, respectively. Under low sentence reuse conditions there were no statistically significant differences between the location-aware approaches and the All approach. Under these conditions, the Mix approach performed better than the Spec approach in a statistically significant way.
Comparing geological and statistical approaches for element selection in sediment tracing research
NASA Astrophysics Data System (ADS)
Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon
2015-04-01
Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only the Lexys Creek modelling results differing significantly (35%). The statistical model with expanded elemental selection (DFA0.35) differed from the geological model by an average of 30% for all 6 models. Elemental selection for sediment fingerprinting therefore has the potential to impact modeling results. Accordingly is important to incorporate both robust geological and statistical approaches when selecting elements for sediment fingerprinting. For the Baroon Pocket Dam, management should focus on reducing the supply of sediments derived from felsic sources in each of the subcatchments.
Risk prediction model: Statistical and artificial neural network approach
NASA Astrophysics Data System (ADS)
Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim
2017-04-01
Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.
Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy
NASA Astrophysics Data System (ADS)
Gil, D.; Garcia-Barnes, J.; Hernández-Sabate, A.; Marti, E.
2010-03-01
Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.
Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.
2016-01-01
Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.
New approach in the quantum statistical parton distribution
NASA Astrophysics Data System (ADS)
Sohaily, Sozha; Vaziri (Khamedi), Mohammad
2017-12-01
An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.
NASA Astrophysics Data System (ADS)
Kassem, M.; Soize, C.; Gagliardini, L.
2009-06-01
In this paper, an energy-density field approach applied to the vibroacoustic analysis of complex industrial structures in the low- and medium-frequency ranges is presented. This approach uses a statistical computational model. The analyzed system consists of an automotive vehicle structure coupled with its internal acoustic cavity. The objective of this paper is to make use of the statistical properties of the frequency response functions of the vibroacoustic system observed from previous experimental and numerical work. The frequency response functions are expressed in terms of a dimensionless matrix which is estimated using the proposed energy approach. Using this dimensionless matrix, a simplified vibroacoustic model is proposed.
D. Todd Jones-Farrand; Todd M. Fearer; Wayne E. Thogmartin; Frank R. Thompson; Mark D. Nelson; John M. Tirpak
2011-01-01
Selection of a modeling approach is an important step in the conservation planning process, but little guidance is available. We compared two statistical and three theoretical habitat modeling approaches representing those currently being used for avian conservation planning at landscape and regional scales: hierarchical spatial count (HSC), classification and...
Targeted versus statistical approaches to selecting parameters for modelling sediment provenance
NASA Astrophysics Data System (ADS)
Laceby, J. Patrick
2017-04-01
One effective field-based approach to modelling sediment provenance is the source fingerprinting technique. Arguably, one of the most important steps for this approach is selecting the appropriate suite of parameters or fingerprints used to model source contributions. Accordingly, approaches to selecting parameters for sediment source fingerprinting will be reviewed. Thereafter, opportunities and limitations of these approaches and some future research directions will be presented. For properties to be effective tracers of sediment, they must discriminate between sources whilst behaving conservatively. Conservative behavior is characterized by constancy in sediment properties, where the properties of sediment sources remain constant, or at the very least, any variation in these properties should occur in a predictable and measurable way. Therefore, properties selected for sediment source fingerprinting should remain constant through sediment detachment, transportation and deposition processes, or vary in a predictable and measurable way. One approach to select conservative properties for sediment source fingerprinting is to identify targeted tracers, such as caesium-137, that provide specific source information (e.g. surface versus subsurface origins). A second approach is to use statistical tests to select an optimal suite of conservative properties capable of modelling sediment provenance. In general, statistical approaches use a combination of a discrimination (e.g. Kruskal Wallis H-test, Mann-Whitney U-test) and parameter selection statistics (e.g. Discriminant Function Analysis or Principle Component Analysis). The challenge is that modelling sediment provenance is often not straightforward and there is increasing debate in the literature surrounding the most appropriate approach to selecting elements for modelling. Moving forward, it would be beneficial if researchers test their results with multiple modelling approaches, artificial mixtures, and multiple lines of evidence to provide secondary support to their initial modelling results. Indeed, element selection can greatly impact modelling results and having multiple lines of evidence will help provide confidence when modelling sediment provenance.
Assessment of credit risk based on fuzzy relations
NASA Astrophysics Data System (ADS)
Tsabadze, Teimuraz
2017-06-01
The purpose of this paper is to develop a new approach for an assessment of the credit risk to corporate borrowers. There are different models for borrowers' risk assessment. These models are divided into two groups: statistical and theoretical. When assessing the credit risk for corporate borrowers, statistical model is unacceptable due to the lack of sufficiently large history of defaults. At the same time, we cannot use some theoretical models due to the lack of stock exchange. In those cases, when studying a particular borrower given that statistical base does not exist, the decision-making process is always of expert nature. The paper describes a new approach that may be used in group decision-making. An example of the application of the proposed approach is given.
A Statistical Approach to Passive Target Tracking.
1981-04-01
a fixed heading of 90 degrees. For 7F. A. Graybill , An Introduction to Linear Statistical Models , Vol. 1, New York: John Wiley&-Sons -Inc. (1961). 13...likelihood estimators. 12 NCSC TM 311-81 The adjustment for a changing error variance is easy using the linear model approach; i.e., use weighted
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)
NASA Astrophysics Data System (ADS)
Kasibhatla, P.
2004-12-01
In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
NASA Astrophysics Data System (ADS)
Müller, M. F.; Thompson, S. E.
2015-09-01
The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.
NASA Astrophysics Data System (ADS)
Müller, M. F.; Thompson, S. E.
2016-02-01
The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.
Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and...
Comparing estimates of climate change impacts from process-based and statistical crop models
NASA Astrophysics Data System (ADS)
Lobell, David B.; Asseng, Senthold
2017-01-01
The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.
A new statistical approach to climate change detection and attribution
NASA Astrophysics Data System (ADS)
Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe
2017-01-01
We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
Analyzing Dyadic Sequence Data—Research Questions and Implied Statistical Models
Fuchs, Peter; Nussbeck, Fridtjof W.; Meuwly, Nathalie; Bodenmann, Guy
2017-01-01
The analysis of observational data is often seen as a key approach to understanding dynamics in romantic relationships but also in dyadic systems in general. Statistical models for the analysis of dyadic observational data are not commonly known or applied. In this contribution, selected approaches to dyadic sequence data will be presented with a focus on models that can be applied when sample sizes are of medium size (N = 100 couples or less). Each of the statistical models is motivated by an underlying potential research question, the most important model results are presented and linked to the research question. The following research questions and models are compared with respect to their applicability using a hands on approach: (I) Is there an association between a particular behavior by one and the reaction by the other partner? (Pearson Correlation); (II) Does the behavior of one member trigger an immediate reaction by the other? (aggregated logit models; multi-level approach; basic Markov model); (III) Is there an underlying dyadic process, which might account for the observed behavior? (hidden Markov model); and (IV) Are there latent groups of dyads, which might account for observing different reaction patterns? (mixture Markov; optimal matching). Finally, recommendations for researchers to choose among the different models, issues of data handling, and advises to apply the statistical models in empirical research properly are given (e.g., in a new r-package “DySeq”). PMID:28443037
NASA Technical Reports Server (NTRS)
Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.
1992-01-01
A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.
Heads Up! a Calculation- & Jargon-Free Approach to Statistics
ERIC Educational Resources Information Center
Giese, Alan R.
2012-01-01
Evaluating the strength of evidence in noisy data is a critical step in scientific thinking that typically relies on statistics. Students without statistical training will benefit from heuristic models that highlight the logic of statistical analysis. The likelihood associated with various coin-tossing outcomes gives students such a model. There…
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Teaching Classical Statistical Mechanics: A Simulation Approach.
ERIC Educational Resources Information Center
Sauer, G.
1981-01-01
Describes a one-dimensional model for an ideal gas to study development of disordered motion in Newtonian mechanics. A Monte Carlo procedure for simulation of the statistical ensemble of an ideal gas with fixed total energy is developed. Compares both approaches for a pseudoexperimental foundation of statistical mechanics. (Author/JN)
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-25
... public. Mathematical and statistical models can be useful in predicting the timing and impact of the... applying any mathematical, statistical, or other approach to predictive modeling. This challenge will... Services (HHS) region level(s) in the United States by developing mathematical and statistical models that...
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments
2015-09-30
statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal
Latent spatial models and sampling design for landscape genetics
Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.
2016-01-01
We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.
Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John
2018-03-07
DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...
2015-02-05
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*
Castruccio, Stefano; McInerney, David J.; Stein, Michael L.; ...
2014-02-24
The authors describe a new approach for emulating the output of a fully coupled climate model under arbitrary forcing scenarios that is based on a small set of precomputed runs from the model. Temperature and precipitation are expressed as simple functions of the past trajectory of atmospheric CO 2 concentrations, and a statistical model is fit using a limited set of training runs. The approach is demonstrated to be a useful and computationally efficient alternative to pattern scaling and captures the nonlinear evolution of spatial patterns of climate anomalies inherent in transient climates. The approach does as well as patternmore » scaling in all circumstances and substantially better in many; it is not computationally demanding; and, once the statistical model is fit, it produces emulated climate output effectively instantaneously. In conclusion, it may therefore find wide application in climate impacts assessments and other policy analyses requiring rapid climate projections.« less
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.
Verde, Pablo E
2010-12-30
In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.
Evaluating model accuracy for model-based reasoning
NASA Technical Reports Server (NTRS)
Chien, Steve; Roden, Joseph
1992-01-01
Described here is an approach to automatically assessing the accuracy of various components of a model. In this approach, actual data from the operation of a target system is used to drive statistical measures to evaluate the prediction accuracy of various portions of the model. We describe how these statistical measures of model accuracy can be used in model-based reasoning for monitoring and design. We then describe the application of these techniques to the monitoring and design of the water recovery system of the Environmental Control and Life Support System (ECLSS) of Space Station Freedom.
Rivas, Elena; Lang, Raymond; Eddy, Sean R
2012-02-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
Rivas, Elena; Lang, Raymond; Eddy, Sean R.
2012-01-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308
2013-01-01
Background As a result of changes in climatic conditions and greater resistance to insecticides, many regions across the globe, including Colombia, have been facing a resurgence of vector-borne diseases, and dengue fever in particular. Timely information on both (1) the spatial distribution of the disease, and (2) prevailing vulnerabilities of the population are needed to adequately plan targeted preventive intervention. We propose a methodology for the spatial assessment of current socioeconomic vulnerabilities to dengue fever in Cali, a tropical urban environment of Colombia. Methods Based on a set of socioeconomic and demographic indicators derived from census data and ancillary geospatial datasets, we develop a spatial approach for both expert-based and purely statistical-based modeling of current vulnerability levels across 340 neighborhoods of the city using a Geographic Information System (GIS). The results of both approaches are comparatively evaluated by means of spatial statistics. A web-based approach is proposed to facilitate the visualization and the dissemination of the output vulnerability index to the community. Results The statistical and the expert-based modeling approach exhibit a high concordance, globally, and spatially. The expert-based approach indicates a slightly higher vulnerability mean (0.53) and vulnerability median (0.56) across all neighborhoods, compared to the purely statistical approach (mean = 0.48; median = 0.49). Both approaches reveal that high values of vulnerability tend to cluster in the eastern, north-eastern, and western part of the city. These are poor neighborhoods with high percentages of young (i.e., < 15 years) and illiterate residents, as well as a high proportion of individuals being either unemployed or doing housework. Conclusions Both modeling approaches reveal similar outputs, indicating that in the absence of local expertise, statistical approaches could be used, with caution. By decomposing identified vulnerability “hotspots” into their underlying factors, our approach provides valuable information on both (1) the location of neighborhoods, and (2) vulnerability factors that should be given priority in the context of targeted intervention strategies. The results support decision makers to allocate resources in a manner that may reduce existing susceptibilities and strengthen resilience, and thus help to reduce the burden of vector-borne diseases. PMID:23945265
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data
NASA Technical Reports Server (NTRS)
Ladbury, R.; Campola, M. J.
2015-01-01
New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
MacLean, Adam L; Harrington, Heather A; Stumpf, Michael P H; Byrne, Helen M
2016-01-01
The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.
Probabilistic models for reactive behaviour in heterogeneous condensed phase media
NASA Astrophysics Data System (ADS)
Baer, M. R.; Gartling, D. K.; DesJardin, P. E.
2012-02-01
This work presents statistically-based models to describe reactive behaviour in heterogeneous energetic materials. Mesoscale effects are incorporated in continuum-level reactive flow descriptions using probability density functions (pdfs) that are associated with thermodynamic and mechanical states. A generalised approach is presented that includes multimaterial behaviour by treating the volume fraction as a random kinematic variable. Model simplifications are then sought to reduce the complexity of the description without compromising the statistical approach. Reactive behaviour is first considered for non-deformable media having a random temperature field as an initial state. A pdf transport relationship is derived and an approximate moment approach is incorporated in finite element analysis to model an example application whereby a heated fragment impacts a reactive heterogeneous material which leads to a delayed cook-off event. Modelling is then extended to include deformation effects associated with shock loading of a heterogeneous medium whereby random variables of strain, strain-rate and temperature are considered. A demonstrative mesoscale simulation of a non-ideal explosive is discussed that illustrates the joint statistical nature of the strain and temperature fields during shock loading to motivate the probabilistic approach. This modelling is derived in a Lagrangian framework that can be incorporated in continuum-level shock physics analysis. Future work will consider particle-based methods for a numerical implementation of this modelling approach.
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.
Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying
2018-01-01
Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego
2016-06-17
Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.
Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning
Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego
2016-01-01
Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273
NASA Astrophysics Data System (ADS)
Terando, A. J.; Grade, S.; Bowden, J.; Henareh Khalyani, A.; Wootten, A.; Misra, V.; Collazo, J.; Gould, W. A.; Boyles, R.
2016-12-01
Sub-tropical island nations may be particularly vulnerable to anthropogenic climate change because of predicted changes in the hydrologic cycle that would lead to significant drying in the future. However, decision makers in these regions have seen their adaptation planning efforts frustrated by the lack of island-resolving climate model information. Recently, two investigations have used statistical and dynamical downscaling techniques to develop climate change projections for the U.S. Caribbean region (Puerto Rico and U.S. Virgin Islands). We compare the results from these two studies with respect to three commonly downscaled CMIP5 global climate models (GCMs). The GCMs were dynamically downscaled at a convective-permitting scale using two different regional climate models. The statistical downscaling approach was conducted at locations with long-term climate observations and then further post-processed using climatologically aided interpolation (yielding two sets of projections). Overall, both approaches face unique challenges. The statistical approach suffers from a lack of observations necessary to constrain the model, particularly at the land-ocean boundary and in complex terrain. The dynamically downscaled model output has a systematic dry bias over the island despite ample availability of moisture in the atmospheric column. Notwithstanding these differences, both approaches are consistent in projecting a drier climate that is driven by the strong global-scale anthropogenic forcing.
Some Statistics for Assessing Person-Fit Based on Continuous-Response Models
ERIC Educational Resources Information Center
Ferrando, Pere Joan
2010-01-01
This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial.
NASA Astrophysics Data System (ADS)
Flores-Marquez, Leticia Elsa; Ramirez Rojaz, Alejandro; Telesca, Luciano
2015-04-01
The study of two statistical approaches is analyzed for two different types of data sets, one is the seismicity generated by the subduction processes occurred at south Pacific coast of Mexico between 2005 and 2012, and the other corresponds to the synthetic seismic data generated by a stick-slip experimental model. The statistical methods used for the present study are the visibility graph in order to investigate the time dynamics of the series and the scaled probability density function in the natural time domain to investigate the critical order of the system. This comparison has the purpose to show the similarities between the dynamical behaviors of both types of data sets, from the point of view of critical systems. The observed behaviors allow us to conclude that the experimental set up globally reproduces the behavior observed in the statistical approaches used to analyses the seismicity of the subduction zone. The present study was supported by the Bilateral Project Italy-Mexico Experimental Stick-slip models of tectonic faults: innovative statistical approaches applied to synthetic seismic sequences, jointly funded by MAECI (Italy) and AMEXCID (Mexico) in the framework of the Bilateral Agreement for Scientific and Technological Cooperation PE 2014-2016.
ERIC Educational Resources Information Center
Romeu, Jorge Luis
2008-01-01
This article discusses our teaching approach in graduate level Engineering Statistics. It is based on the use of modern technology, learning groups, contextual projects, simulation models, and statistical and simulation software to entice student motivation. The use of technology to facilitate group projects and presentations, and to generate,…
Lord, Dominique; Washington, Simon P; Ivan, John N
2005-01-01
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states-perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of "excess" zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to "excess" zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed-and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.
Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido
2013-04-01
Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
A Statistical-Physics Approach to Language Acquisition and Language Change
NASA Astrophysics Data System (ADS)
Cassandro, Marzio; Collet, Pierre; Galves, Antonio; Galves, Charlotte
1999-02-01
The aim of this paper is to explain why Statistical Physics can help understanding two related linguistic questions. The first question is how to model first language acquisition by a child. The second question is how language change proceeds in time. Our approach is based on a Gibbsian model for the interface between syntax and prosody. We also present a simulated annealing model of language acquisition, which extends the Triggering Learning Algorithm recently introduced in the linguistic literature.
Probabilistic Modeling and Visualization of the Flexibility in Morphable Models
NASA Astrophysics Data System (ADS)
Lüthi, M.; Albrecht, T.; Vetter, T.
Statistical shape models, and in particular morphable models, have gained widespread use in computer vision, computer graphics and medical imaging. Researchers have started to build models of almost any anatomical structure in the human body. While these models provide a useful prior for many image analysis task, relatively little information about the shape represented by the morphable model is exploited. We propose a method for computing and visualizing the remaining flexibility, when a part of the shape is fixed. Our method, which is based on Probabilistic PCA, not only leads to an approach for reconstructing the full shape from partial information, but also allows us to investigate and visualize the uncertainty of a reconstruction. To show the feasibility of our approach we performed experiments on a statistical model of the human face and the femur bone. The visualization of the remaining flexibility allows for greater insight into the statistical properties of the shape.
On prognostic models, artificial intelligence and censored observations.
Anand, S S; Hamilton, P W; Hughes, J G; Bell, D A
2001-03-01
The development of prognostic models for assisting medical practitioners with decision making is not a trivial task. Models need to possess a number of desirable characteristics and few, if any, current modelling approaches based on statistical or artificial intelligence can produce models that display all these characteristics. The inability of modelling techniques to provide truly useful models has led to interest in these models being purely academic in nature. This in turn has resulted in only a very small percentage of models that have been developed being deployed in practice. On the other hand, new modelling paradigms are being proposed continuously within the machine learning and statistical community and claims, often based on inadequate evaluation, being made on their superiority over traditional modelling methods. We believe that for new modelling approaches to deliver true net benefits over traditional techniques, an evaluation centric approach to their development is essential. In this paper we present such an evaluation centric approach to developing extensions to the basic k-nearest neighbour (k-NN) paradigm. We use standard statistical techniques to enhance the distance metric used and a framework based on evidence theory to obtain a prediction for the target example from the outcome of the retrieved exemplars. We refer to this new k-NN algorithm as Censored k-NN (Ck-NN). This reflects the enhancements made to k-NN that are aimed at providing a means for handling censored observations within k-NN.
An astronomer's guide to period searching
NASA Astrophysics Data System (ADS)
Schwarzenberg-Czerny, A.
2003-03-01
We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I
2013-05-01
When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.
Reconciling statistical and systems science approaches to public health.
Ip, Edward H; Rahmandad, Hazhir; Shoham, David A; Hammond, Ross; Huang, Terry T-K; Wang, Youfa; Mabry, Patricia L
2013-10-01
Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision.
Reconciling Statistical and Systems Science Approaches to Public Health
Ip, Edward H.; Rahmandad, Hazhir; Shoham, David A.; Hammond, Ross; Huang, Terry T.-K.; Wang, Youfa; Mabry, Patricia L.
2016-01-01
Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision. PMID:24084395
NASA Astrophysics Data System (ADS)
Berliner, M.
2017-12-01
Bayesian statistical decision theory offers a natural framework for decision-policy making in the presence of uncertainty. Key advantages of the approach include efficient incorporation of information and observations. However, in complicated settings it is very difficult, perhaps essentially impossible, to formalize the mathematical inputs needed in the approach. Nevertheless, using the approach as a template is useful for decision support; that is, organizing and communicating our analyses. Bayesian hierarchical modeling is valuable in quantifying and managing uncertainty such cases. I review some aspects of the idea emphasizing statistical model development and use in the context of sea-level rise.
Colegrave, Nick
2017-01-01
A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912
Predicting future protection of respirator users: Statistical approaches and practical implications.
Hu, Chengcheng; Harber, Philip; Su, Jing
2016-01-01
The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2017-01-01
Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237
ERIC Educational Resources Information Center
Nevitt, Jonathan; Hancock, Gregory R.
2001-01-01
Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…
Using statistical equivalence testing logic and mixed model theory an approach has been developed, that extends the work of Stork et al (JABES,2008), to define sufficient similarity in dose-response for chemical mixtures containing the same chemicals with different ratios ...
Formulating Spatially Varying Performance in the Statistical Fusion Framework
Landman, Bennett A.
2012-01-01
To date, label fusion methods have primarily relied either on global (e.g. STAPLE, globally weighted vote) or voxelwise (e.g. locally weighted vote) performance models. Optimality of the statistical fusion framework hinges upon the validity of the stochastic model of how a rater errs (i.e., the labeling process model). Hitherto, approaches have tended to focus on the extremes of potential models. Herein, we propose an extension to the STAPLE approach to seamlessly account for spatially varying performance by extending the performance level parameters to account for a smooth, voxelwise performance level field that is unique to each rater. This approach, Spatial STAPLE, provides significant improvements over state-of-the-art label fusion algorithms in both simulated and empirical data sets. PMID:22438513
Schäffer, Beat; Pieren, Reto; Mendolia, Franco; Basner, Mathias; Brink, Mark
2017-05-01
Noise exposure-response relationships are used to estimate the effects of noise on individuals or a population. Such relationships may be derived from independent or repeated binary observations, and modeled by different statistical methods. Depending on the method by which they were established, their application in population risk assessment or estimation of individual responses may yield different results, i.e., predict "weaker" or "stronger" effects. As far as the present body of literature on noise effect studies is concerned, however, the underlying statistical methodology to establish exposure-response relationships has not always been paid sufficient attention. This paper gives an overview on two statistical approaches (subject-specific and population-averaged logistic regression analysis) to establish noise exposure-response relationships from repeated binary observations, and their appropriate applications. The considerations are illustrated with data from three noise effect studies, estimating also the magnitude of differences in results when applying exposure-response relationships derived from the two statistical approaches. Depending on the underlying data set and the probability range of the binary variable it covers, the two approaches yield similar to very different results. The adequate choice of a specific statistical approach and its application in subsequent studies, both depending on the research question, are therefore crucial.
NASA Astrophysics Data System (ADS)
Olugboji, T. M.; Lekic, V.; McDonough, W.
2017-07-01
We present a new approach for evaluating existing crustal models using ambient noise data sets and its associated uncertainties. We use a transdimensional hierarchical Bayesian inversion approach to invert ambient noise surface wave phase dispersion maps for Love and Rayleigh waves using measurements obtained from Ekström (2014). Spatiospectral analysis shows that our results are comparable to a linear least squares inverse approach (except at higher harmonic degrees), but the procedure has additional advantages: (1) it yields an autoadaptive parameterization that follows Earth structure without making restricting assumptions on model resolution (regularization or damping) and data errors; (2) it can recover non-Gaussian phase velocity probability distributions while quantifying the sources of uncertainties in the data measurements and modeling procedure; and (3) it enables statistical assessments of different crustal models (e.g., CRUST1.0, LITHO1.0, and NACr14) using variable resolution residual and standard deviation maps estimated from the ensemble. These assessments show that in the stable old crust of the Archean, the misfits are statistically negligible, requiring no significant update to crustal models from the ambient noise data set. In other regions of the U.S., significant updates to regionalization and crustal structure are expected especially in the shallow sedimentary basins and the tectonically active regions, where the differences between model predictions and data are statistically significant.
Modeling Time-Dependent Association in Longitudinal Data: A Lag as Moderator Approach
ERIC Educational Resources Information Center
Selig, James P.; Preacher, Kristopher J.; Little, Todd D.
2012-01-01
We describe a straightforward, yet novel, approach to examine time-dependent association between variables. The approach relies on a measurement-lag research design in conjunction with statistical interaction models. We base arguments in favor of this approach on the potential for better understanding the associations between variables by…
The Effect on the 8th Grade Students' Attitude towards Statistics of Project Based Learning
ERIC Educational Resources Information Center
Koparan, Timur; Güven, Bülent
2014-01-01
This study investigates the effect of the project based learning approach on 8th grade students' attitude towards statistics. With this aim, an attitude scale towards statistics was developed. Quasi-experimental research model was used in this study. Following this model in the control group the traditional method was applied to teach statistics…
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
USDA-ARS?s Scientific Manuscript database
The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...
A Modeling Approach to the Development of Students' Informal Inferential Reasoning
ERIC Educational Resources Information Center
Doerr, Helen M.; Delmas, Robert; Makar, Katie
2017-01-01
Teaching from an informal statistical inference perspective can address the challenge of teaching statistics in a coherent way. We argue that activities that promote model-based reasoning address two additional challenges: providing a coherent sequence of topics and promoting the application of knowledge to novel situations. We take a models and…
Imputation approaches for animal movement modeling
Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.
2017-01-01
The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.
Assessing risk factors for dental caries: a statistical modeling approach.
Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella
2015-01-01
The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.
Statistical appearance models based on probabilistic correspondences.
Krüger, Julia; Ehrhardt, Jan; Handels, Heinz
2017-04-01
Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Determination of apparent coupling factors for adhesive bonded acrylic plates using SEAL approach
NASA Astrophysics Data System (ADS)
Pankaj, Achuthan. C.; Shivaprasad, M. V.; Murigendrappa, S. M.
2018-04-01
Apparent coupling loss factors (CLF) and velocity responses has been computed for two lap joined adhesive bonded plates using finite element and experimental statistical energy analysis like approach. A finite element model of the plates has been created using ANSYS software. The statistical energy parameters have been computed using the velocity responses obtained from a harmonic forced excitation analysis. Experiments have been carried out for two different cases of adhesive bonded joints and the results have been compared with the apparent coupling factors and velocity responses obtained from finite element analysis. The results obtained from the studies signify the importance of modeling of adhesive bonded joints in computation of the apparent coupling factors and its further use in computation of energies and velocity responses using statistical energy analysis like approach.
Model fit evaluation in multilevel structural equation models
Ryu, Ehri
2014-01-01
Assessing goodness of model fit is one of the key questions in structural equation modeling (SEM). Goodness of fit is the extent to which the hypothesized model reproduces the multivariate structure underlying the set of variables. During the earlier development of multilevel structural equation models, the “standard” approach was to evaluate the goodness of fit for the entire model across all levels simultaneously. The model fit statistics produced by the standard approach have a potential problem in detecting lack of fit in the higher-level model for which the effective sample size is much smaller. Also when the standard approach results in poor model fit, it is not clear at which level the model does not fit well. This article reviews two alternative approaches that have been proposed to overcome the limitations of the standard approach. One is a two-step procedure which first produces estimates of saturated covariance matrices at each level and then performs single-level analysis at each level with the estimated covariance matrices as input (Yuan and Bentler, 2007). The other level-specific approach utilizes partially saturated models to obtain test statistics and fit indices for each level separately (Ryu and West, 2009). Simulation studies (e.g., Yuan and Bentler, 2007; Ryu and West, 2009) have consistently shown that both alternative approaches performed well in detecting lack of fit at any level, whereas the standard approach failed to detect lack of fit at the higher level. It is recommended that the alternative approaches are used to assess the model fit in multilevel structural equation model. Advantages and disadvantages of the two alternative approaches are discussed. The alternative approaches are demonstrated in an empirical example. PMID:24550882
Dettmer, Jan; Dosso, Stan E
2012-10-01
This paper develops a trans-dimensional approach to matched-field geoacoustic inversion, including interacting Markov chains to improve efficiency and an autoregressive model to account for correlated errors. The trans-dimensional approach and hierarchical seabed model allows inversion without assuming any particular parametrization by relaxing model specification to a range of plausible seabed models (e.g., in this case, the number of sediment layers is an unknown parameter). Data errors are addressed by sampling statistical error-distribution parameters, including correlated errors (covariance), by applying a hierarchical autoregressive error model. The well-known difficulty of low acceptance rates for trans-dimensional jumps is addressed with interacting Markov chains, resulting in a substantial increase in efficiency. The trans-dimensional seabed model and the hierarchical error model relax the degree of prior assumptions required in the inversion, resulting in substantially improved (more realistic) uncertainty estimates and a more automated algorithm. In particular, the approach gives seabed parameter uncertainty estimates that account for uncertainty due to prior model choice (layering and data error statistics). The approach is applied to data measured on a vertical array in the Mediterranean Sea.
Statistical basis and outputs of stable isotope mixing models: Comment on Fry (2013)
A recent article by Fry (2013; Mar Ecol Prog Ser 472:1−13) reviewed approaches to solving underdetermined stable isotope mixing systems, and presented a new graphical approach and set of summary statistics for the analysis of such systems. In his review, Fry (2013) mis-characteri...
Webster, R J; Williams, A; Marchetti, F; Yauk, C L
2018-07-01
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Ensor, Joie; Riley, Richard D.
2016-01-01
Meta‐analysis using individual participant data (IPD) obtains and synthesises the raw, participant‐level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta‐analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual‐level interactions, such as treatment‐effect modifiers. There are two statistical approaches for conducting an IPD meta‐analysis: one‐stage and two‐stage. The one‐stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two‐stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta‐analysis model. There have been numerous comparisons of the one‐stage and two‐stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ. In this tutorial paper, we outline the key statistical methods for one‐stage and two‐stage IPD meta‐analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one‐stage or two‐stage itself. We illustrate the concepts with recently published IPD meta‐analyses, summarise key statistical software and provide recommendations for future IPD meta‐analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:27747915
NASA Astrophysics Data System (ADS)
Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino
2015-04-01
To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.
Simulating Metabolism with Statistical Thermodynamics
Cannon, William R.
2014-01-01
New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed. PMID:25089525
Simulating metabolism with statistical thermodynamics.
Cannon, William R
2014-01-01
New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed.
A smoothed residual based goodness-of-fit statistic for nest-survival models
Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell
2008-01-01
Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...
Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin
2017-09-01
N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
Modeling epidemics on adaptively evolving networks: A data-mining perspective.
Kattis, Assimakis A; Holiday, Alexander; Stoica, Ana-Andreea; Kevrekidis, Ioannis G
2016-01-01
The exploration of epidemic dynamics on dynamically evolving ("adaptive") networks poses nontrivial challenges to the modeler, such as the determination of a small number of informative statistics of the detailed network state (that is, a few "good observables") that usefully summarize the overall (macroscopic, systems-level) behavior. Obtaining reduced, small size accurate models in terms of these few statistical observables--that is, trying to coarse-grain the full network epidemic model to a small but useful macroscopic one--is even more daunting. Here we describe a data-based approach to solving the first challenge: the detection of a few informative collective observables of the detailed epidemic dynamics. This is accomplished through Diffusion Maps (DMAPS), a recently developed data-mining technique. We illustrate the approach through simulations of a simple mathematical model of epidemics on a network: a model known to exhibit complex temporal dynamics. We discuss potential extensions of the approach, as well as possible shortcomings.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
NASA Astrophysics Data System (ADS)
Lehmann, Rüdiger; Lösler, Michael
2017-12-01
Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.
Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis
ERIC Educational Resources Information Center
Young, Cristobal; Holsteen, Katherine
2017-01-01
Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.
Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A
2010-11-01
The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.
NASA Astrophysics Data System (ADS)
Tsutsumi, Morito; Seya, Hajime
2009-12-01
This study discusses the theoretical foundation of the application of spatial hedonic approaches—the hedonic approach employing spatial econometrics or/and spatial statistics—to benefits evaluation. The study highlights the limitations of the spatial econometrics approach since it uses a spatial weight matrix that is not employed by the spatial statistics approach. Further, the study presents empirical analyses by applying the Spatial Autoregressive Error Model (SAEM), which is based on the spatial econometrics approach, and the Spatial Process Model (SPM), which is based on the spatial statistics approach. SPMs are conducted based on both isotropy and anisotropy and applied to different mesh sizes. The empirical analysis reveals that the estimated benefits are quite different, especially between isotropic and anisotropic SPM and between isotropic SPM and SAEM; the estimated benefits are similar for SAEM and anisotropic SPM. The study demonstrates that the mesh size does not affect the estimated amount of benefits. Finally, the study provides a confidence interval for the estimated benefits and raises an issue with regard to benefit evaluation.
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar
2016-03-01
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
ERIC Educational Resources Information Center
Spencer, Bryden
2016-01-01
Value-added models are a class of growth models used in education to assign responsibility for student growth to teachers or schools. For value-added models to be used fairly, sufficient statistical precision is necessary for accurate teacher classification. Previous research indicated precision below practical limits. An alternative approach has…
From medium heterogeneity to flow and transport: A time-domain random walk approach
NASA Astrophysics Data System (ADS)
Hakoun, V.; Comolli, A.; Dentz, M.
2017-12-01
The prediction of flow and transport processes in heterogeneous porous media is based on the qualitative and quantitative understanding of the interplay between 1) spatial variability of hydraulic conductivity, 2) groundwater flow and 3) solute transport. Using a stochastic modeling approach, we study this interplay through direct numerical simulations of Darcy flow and advective transport in heterogeneous media. First, we study flow in correlated hydraulic permeability fields and shed light on the relationship between the statistics of log-hydraulic conductivity, a medium attribute, and the flow statistics. Second, we determine relationships between Eulerian and Lagrangian velocity statistics, this means, between flow and transport attributes. We show how Lagrangian statistics and thus transport behaviors such as late particle arrival times are influenced by the medium heterogeneity on one hand and the initial particle velocities on the other. We find that equidistantly sampled Lagrangian velocities can be described by a Markov process that evolves on the characteristic heterogeneity length scale. We employ a stochastic relaxation model for the equidistantly sampled particle velocities, which is parametrized by the velocity correlation length. This description results in a time-domain random walk model for the particle motion, whose spatial transitions are characterized by the velocity correlation length and temporal transitions by the particle velocities. This approach relates the statistical medium and flow properties to large scale transport, and allows for conditioning on the initial particle velocities and thus to the medium properties in the injection region. The approach is tested against direct numerical simulations.
Zevin, Jason D; Miller, Brett
Reading research is increasingly a multi-disciplinary endeavor involving more complex, team-based science approaches. These approaches offer the potential of capturing the complexity of reading development, the emergence of individual differences in reading performance over time, how these differences relate to the development of reading difficulties and disability, and more fully understanding the nature of skilled reading in adults. This special issue focuses on the potential opportunities and insights that early and richly integrated advanced statistical and computational modeling approaches can provide to our foundational (and translational) understanding of reading. The issue explores how computational and statistical modeling, using both observed and simulated data, can serve as a contact point among research domains and topics, complement other data sources and critically provide analytic advantages over current approaches.
A BAYESIAN STATISTICAL APPROACHES FOR THE EVALUATION OF CMAQ
This research focuses on the application of spatial statistical techniques for the evaluation of the Community Multiscale Air Quality (CMAQ) model. The upcoming release version of the CMAQ model was run for the calendar year 2001 and is in the process of being evaluated by EPA an...
USDA-ARS?s Scientific Manuscript database
Cover: The electrospinning technique was employed to obtain conducting nanofibers based on polyaniline and poly(lactic acid). A statistical model was employed to describe how the process factors (solution concentration, applied voltage, and flow rate) govern the fiber dimensions. Nanofibers down to ...
ERIC Educational Resources Information Center
Mulford, Bill; Silins, Halia
2011-01-01
Purpose: This study aims to present revised models and a reconceptualisation of successful school principalship for improved student outcomes. Design/methodology/approach: The study's approach is qualitative and quantitative, culminating in model building and multi-level statistical analyses. Findings: Principals who promote both capacity building…
Time Series Expression Analyses Using RNA-seq: A Statistical Approach
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
ERIC Educational Resources Information Center
Lu, Yonggang; Henning, Kevin S. S.
2013-01-01
Spurred by recent writings regarding statistical pragmatism, we propose a simple, practical approach to introducing students to a new style of statistical thinking that models nature through the lens of data-generating processes, not populations. (Contains 5 figures.)
Stochastic modeling of sunshine number data
NASA Astrophysics Data System (ADS)
Brabec, Marek; Paulescu, Marius; Badescu, Viorel
2013-11-01
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.
Stochastic modeling of sunshine number data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel
2013-11-13
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less
Work domain constraints for modelling surgical performance.
Morineau, Thierry; Riffaud, Laurent; Morandi, Xavier; Villain, Jonathan; Jannin, Pierre
2015-10-01
Three main approaches can be identified for modelling surgical performance: a competency-based approach, a task-based approach, both largely explored in the literature, and a less known work domain-based approach. The work domain-based approach first describes the work domain properties that constrain the agent's actions and shape the performance. This paper presents a work domain-based approach for modelling performance during cervical spine surgery, based on the idea that anatomical structures delineate the surgical performance. This model was evaluated through an analysis of junior and senior surgeons' actions. Twenty-four cervical spine surgeries performed by two junior and two senior surgeons were recorded in real time by an expert surgeon. According to a work domain-based model describing an optimal progression through anatomical structures, the degree of adjustment of each surgical procedure to a statistical polynomial function was assessed. Each surgical procedure showed a significant suitability with the model and regression coefficient values around 0.9. However, the surgeries performed by senior surgeons fitted this model significantly better than those performed by junior surgeons. Analysis of the relative frequencies of actions on anatomical structures showed that some specific anatomical structures discriminate senior from junior performances. The work domain-based modelling approach can provide an overall statistical indicator of surgical performance, but in particular, it can highlight specific points of interest among anatomical structures that the surgeons dwelled on according to their level of expertise.
NASA Astrophysics Data System (ADS)
Guadagnini, A.; Riva, M.; Dell'Oca, A.
2017-12-01
We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
NASA Astrophysics Data System (ADS)
Simonin, Olivier; Zaichik, Leonid I.; Alipchenkov, Vladimir M.; Février, Pierre
2006-12-01
The objective of the paper is to elucidate a connection between two approaches that have been separately proposed for modelling the statistical spatial properties of inertial particles in turbulent fluid flows. One of the approaches proposed recently by Février, Simonin, and Squires [J. Fluid Mech. 533, 1 (2005)] is based on the partitioning of particle turbulent velocity field into spatially correlated (mesoscopic Eulerian) and random-uncorrelated (quasi-Brownian) components. The other approach stems from a kinetic equation for the two-point probability density function of the velocity distributions of two particles [Zaichik and Alipchenkov, Phys. Fluids 15, 1776 (2003)]. Comparisons between these approaches are performed for isotropic homogeneous turbulence and demonstrate encouraging agreement.
A note about high blood pressure in childhood
NASA Astrophysics Data System (ADS)
Teodoro, M. Filomena; Simão, Carla
2017-06-01
In medical, behavioral and social sciences it is usual to get a binary outcome. In the present work is collected information where some of the outcomes are binary variables (1='yes'/ 0='no'). In [14] a preliminary study about the caregivers perception of pediatric hypertension was introduced. An experimental questionnaire was designed to be answered by the caregivers of routine pediatric consultation attendees in the Santa Maria's hospital (HSM). The collected data was statistically analyzed, where a descriptive analysis and a predictive model were performed. Significant relations between some socio-demographic variables and the assessed knowledge were obtained. In [14] can be found a statistical data analysis using partial questionnaire's information. The present article completes the statistical approach estimating a model for relevant remaining questions of questionnaire by Generalized Linear Models (GLM). Exploring the binary outcome issue, we intend to extend this approach using Generalized Linear Mixed Models (GLMM), but the process is still ongoing.
Experimental design matters for statistical analysis: how to handle blocking.
Jensen, Signe M; Schaarschmidt, Frank; Onofri, Andrea; Ritz, Christian
2018-03-01
Nowadays, evaluation of the effects of pesticides often relies on experimental designs that involve multiple concentrations of the pesticide of interest or multiple pesticides at specific comparable concentrations and, possibly, secondary factors of interest. Unfortunately, the experimental design is often more or less neglected when analysing data. Two data examples were analysed using different modelling strategies. First, in a randomized complete block design, mean heights of maize treated with a herbicide and one of several adjuvants were compared. Second, translocation of an insecticide applied to maize as a seed treatment was evaluated using incomplete data from an unbalanced design with several layers of hierarchical sampling. Extensive simulations were carried out to further substantiate the effects of different modelling strategies. It was shown that results from suboptimal approaches (two-sample t-tests and ordinary ANOVA assuming independent observations) may be both quantitatively and qualitatively different from the results obtained using an appropriate linear mixed model. The simulations demonstrated that the different approaches may lead to differences in coverage percentages of confidence intervals and type 1 error rates, confirming that misleading conclusions can easily happen when an inappropriate statistical approach is chosen. To ensure that experimental data are summarized appropriately, avoiding misleading conclusions, the experimental design should duly be reflected in the choice of statistical approaches and models. We recommend that author guidelines should explicitly point out that authors need to indicate how the statistical analysis reflects the experimental design. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
NASA Astrophysics Data System (ADS)
Chodera, John D.; Noé, Frank
2010-09-01
Discrete-state Markov (or master equation) models provide a useful simplified representation for characterizing the long-time statistical evolution of biomolecules in a manner that allows direct comparison with experiments as well as the elucidation of mechanistic pathways for an inherently stochastic process. A vital part of meaningful comparison with experiment is the characterization of the statistical uncertainty in the predicted experimental measurement, which may take the form of an equilibrium measurement of some spectroscopic signal, the time-evolution of this signal following a perturbation, or the observation of some statistic (such as the correlation function) of the equilibrium dynamics of a single molecule. Without meaningful error bars (which arise from both approximation and statistical error), there is no way to determine whether the deviations between model and experiment are statistically meaningful. Previous work has demonstrated that a Bayesian method that enforces microscopic reversibility can be used to characterize the statistical component of correlated uncertainties in state-to-state transition probabilities (and functions thereof) for a model inferred from molecular simulation data. Here, we extend this approach to include the uncertainty in observables that are functions of molecular conformation (such as surrogate spectroscopic signals) characterizing each state, permitting the full statistical uncertainty in computed spectroscopic experiments to be assessed. We test the approach in a simple model system to demonstrate that the computed uncertainties provide a useful indicator of statistical variation, and then apply it to the computation of the fluorescence autocorrelation function measured for a dye-labeled peptide previously studied by both experiment and simulation.
ERIC Educational Resources Information Center
Gálvez, Jaime; Conejo, Ricardo; Guzmán, Eduardo
2013-01-01
One of the most popular student modeling approaches is Constraint-Based Modeling (CBM). It is an efficient approach that can be easily applied inside an Intelligent Tutoring System (ITS). Even with these characteristics, building new ITSs requires carefully designing the domain model to be taught because different sources of errors could affect…
Zero-state Markov switching count-data models: an empirical assessment.
Malyshkina, Nataliya V; Mannering, Fred L
2010-01-01
In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.
In silico model-based inference: a contemporary approach for hypothesis testing in network biology
Klinke, David J.
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.
Klinke, David J
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
NASA Astrophysics Data System (ADS)
Liu, L.; Du, L.; Liao, Y.
2017-12-01
Based on the ensemble hindcast dataset of CSM1.1m by NCC, CMA, Bayesian merging models and a two-step statistical model are developed and employed to predict monthly grid/station precipitation in the Huaihe River China during summer at the lead-time of 1 to 3 months. The hindcast datasets span a period of 1991 to 2014. The skill of the two models is evaluated using area under the ROC curve (AUC) in a leave-one-out cross-validation framework, and is compared to the skill of CSM1.1m. CSM1.1m has highest skill for summer precipitation from April while lowest from May, and has highest skill for precipitation in June but lowest for precipitation in July. Compared with raw outputs of climate models, some schemes of the two approaches have higher skill for the prediction from March and May, but almost schemes have lower skill for prediction from April. Compared to two-step approach, one sampling scheme of Bayesian merging approach has higher skill for the prediction from March, but has lower skill from May. The results suggest that there is potential to apply the two statistical models for monthly precipitation forecast in summer from March and from May over Huaihe River basin, but is potential to apply CSM1.1m forecast from April. Finally, the summer runoff during 1991 to 2014 is simulated based on one hydrological model using the climate hindcast of CSM1.1m and the two statistical models.
NASA Astrophysics Data System (ADS)
Aldrin, John C.; Annis, Charles; Sabbagh, Harold A.; Lindgren, Eric A.
2016-02-01
A comprehensive approach to NDE and SHM characterization error (CE) evaluation is presented that follows the framework of the `ahat-versus-a' regression analysis for POD assessment. Characterization capability evaluation is typically more complex with respect to current POD evaluations and thus requires engineering and statistical expertise in the model-building process to ensure all key effects and interactions are addressed. Justifying the statistical model choice with underlying assumptions is key. Several sizing case studies are presented with detailed evaluations of the most appropriate statistical model for each data set. The use of a model-assisted approach is introduced to help assess the reliability of NDE and SHM characterization capability under a wide range of part, environmental and damage conditions. Best practices of using models are presented for both an eddy current NDE sizing and vibration-based SHM case studies. The results of these studies highlight the general protocol feasibility, emphasize the importance of evaluating key application characteristics prior to the study, and demonstrate an approach to quantify the role of varying SHM sensor durability and environmental conditions on characterization performance.
Estimating the Regional Economic Significance of Airports
1992-09-01
following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single
Multiple commodities in statistical microeconomics: Model and market
NASA Astrophysics Data System (ADS)
Baaquie, Belal E.; Yu, Miao; Du, Xin
2016-11-01
A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.
Wen, Shihua; Zhang, Lanju; Yang, Bo
2014-07-01
The Problem formulation, Objectives, Alternatives, Consequences, Trade-offs, Uncertainties, Risk attitude, and Linked decisions (PrOACT-URL) framework and multiple criteria decision analysis (MCDA) have been recommended by the European Medicines Agency for structured benefit-risk assessment of medicinal products undergoing regulatory review. The objective of this article was to provide solutions to incorporate the uncertainty from clinical data into the MCDA model when evaluating the overall benefit-risk profiles among different treatment options. Two statistical approaches, the δ-method approach and the Monte-Carlo approach, were proposed to construct the confidence interval of the overall benefit-risk score from the MCDA model as well as other probabilistic measures for comparing the benefit-risk profiles between treatment options. Both approaches can incorporate the correlation structure between clinical parameters (criteria) in the MCDA model and are straightforward to implement. The two proposed approaches were applied to a case study to evaluate the benefit-risk profile of an add-on therapy for rheumatoid arthritis (drug X) relative to placebo. It demonstrated a straightforward way to quantify the impact of the uncertainty from clinical data to the benefit-risk assessment and enabled statistical inference on evaluating the overall benefit-risk profiles among different treatment options. The δ-method approach provides a closed form to quantify the variability of the overall benefit-risk score in the MCDA model, whereas the Monte-Carlo approach is more computationally intensive but can yield its true sampling distribution for statistical inference. The obtained confidence intervals and other probabilistic measures from the two approaches enhance the benefit-risk decision making of medicinal products. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models
Burr, Tom
2013-01-01
Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668
Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.
Burr, Tom; Skurikhin, Alexei
2013-01-01
Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.
Synchronized Trajectories in a Climate "Supermodel"
NASA Astrophysics Data System (ADS)
Duane, Gregory; Schevenhoven, Francine; Selten, Frank
2017-04-01
Differences in climate projections among state-of-the-art models can be resolved by connecting the models in run-time, either through inter-model nudging or by directly combining the tendencies for corresponding variables. Since it is clearly established that averaging model outputs typically results in improvement as compared to any individual model output, averaged re-initializations at typical analysis time intervals also seems appropriate. The resulting "supermodel" is more like a single model than it is like an ensemble, because the constituent models tend to synchronize even with limited inter-model coupling. Thus one can examine the properties of specific trajectories, rather than averaging the statistical properties of the separate models. We apply this strategy to a study of the index cycle in a supermodel constructed from several imperfect copies of the SPEEDO model (a global primitive-equation atmosphere-ocean-land climate model). As with blocking frequency, typical weather statistics of interest like probabilities of heat waves or extreme precipitation events, are improved as compared to the standard multi-model ensemble approach. In contrast to the standard approach, the supermodel approach provides detailed descriptions of typical actual events.
Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten
2018-01-01
Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
A Model for Investigating Predictive Validity at Highly Selective Institutions.
ERIC Educational Resources Information Center
Gross, Alan L.; And Others
A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…
USDA-ARS?s Scientific Manuscript database
Resolution of climate model outputs are too coarse to be used as direct inputs to impact models for assessing climate change impacts on agricultural production, water resources, and eco-system services at local or site-specific scales. Statistical downscaling approaches are usually used to bridge th...
AA9int: SNP Interaction Pattern Search Using Non-Hierarchical Additive Model Set.
Lin, Hui-Yi; Huang, Po-Yu; Chen, Dung-Tsa; Tung, Heng-Yuan; Sellers, Thomas A; Pow-Sang, Julio; Eeles, Rosalind; Easton, Doug; Kote-Jarai, Zsofia; Amin Al Olama, Ali; Benlloch, Sara; Muir, Kenneth; Giles, Graham G; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schleutker, Johanna; Nordestgaard, Børge G; Travis, Ruth C; Hamdy, Freddie; Neal, David E; Pashayan, Nora; Khaw, Kay-Tee; Stanford, Janet L; Blot, William J; Thibodeau, Stephen N; Maier, Christiane; Kibel, Adam S; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Kaneva, Radka; Batra, Jyotsna; Teixeira, Manuel R; Pandha, Hardev; Lu, Yong-Jie; Park, Jong Y
2018-06-07
The use of single nucleotide polymorphism (SNP) interactions to predict complex diseases is getting more attention during the past decade, but related statistical methods are still immature. We previously proposed the SNP Interaction Pattern Identifier (SIPI) approach to evaluate 45 SNP interaction patterns/patterns. SIPI is statistically powerful but suffers from a large computation burden. For large-scale studies, it is necessary to use a powerful and computation-efficient method. The objective of this study is to develop an evidence-based mini-version of SIPI as the screening tool or solitary use and to evaluate the impact of inheritance mode and model structure on detecting SNP-SNP interactions. We tested two candidate approaches: the 'Five-Full' and 'AA9int' method. The Five-Full approach is composed of the five full interaction models considering three inheritance modes (additive, dominant and recessive). The AA9int approach is composed of nine interaction models by considering non-hierarchical model structure and the additive mode. Our simulation results show that AA9int has similar statistical power compared to SIPI and is superior to the Five-Full approach, and the impact of the non-hierarchical model structure is greater than that of the inheritance mode in detecting SNP-SNP interactions. In summary, it is recommended that AA9int is a powerful tool to be used either alone or as the screening stage of a two-stage approach (AA9int+SIPI) for detecting SNP-SNP interactions in large-scale studies. The 'AA9int' and 'parAA9int' functions (standard and parallel computing version) are added in the SIPI R package, which is freely available at https://linhuiyi.github.io/LinHY_Software/. hlin1@lsuhsc.edu. Supplementary data are available at Bioinformatics online.
Matchett, John R.; Stark, Philip B.; Ostoja, Steven M.; Knapp, Roland A.; McKenny, Heather C.; Brooks, Matthew L.; Langford, William T.; Joppa, Lucas N.; Berlow, Eric L.
2015-01-01
Statistical models often use observational data to predict phenomena; however, interpreting model terms to understand their influence can be problematic. This issue poses a challenge in species conservation where setting priorities requires estimating influences of potential stressors using observational data. We present a novel approach for inferring influence of a rare stressor on a rare species by blending predictive models with nonparametric permutation tests. We illustrate the approach with two case studies involving rare amphibians in Yosemite National Park, USA. The endangered frog, Rana sierrae, is known to be negatively impacted by non-native fish, while the threatened toad, Anaxyrus canorus, is potentially affected by packstock. Both stressors and amphibians are rare, occurring in ~10% of potential habitat patches. We first predict amphibian occupancy with a statistical model that includes all predictors but the stressor to stratify potential habitat by predicted suitability. A stratified permutation test then evaluates the association between stressor and amphibian, all else equal. Our approach confirms the known negative relationship between fish and R. sierrae, but finds no evidence of a negative relationship between current packstock use and A. canorus breeding. Our statistical approach has potential broad application for deriving understanding (not just prediction) from observational data.
Matchett, J. R.; Stark, Philip B.; Ostoja, Steven M.; Knapp, Roland A.; McKenny, Heather C.; Brooks, Matthew L.; Langford, William T.; Joppa, Lucas N.; Berlow, Eric L.
2015-01-01
Statistical models often use observational data to predict phenomena; however, interpreting model terms to understand their influence can be problematic. This issue poses a challenge in species conservation where setting priorities requires estimating influences of potential stressors using observational data. We present a novel approach for inferring influence of a rare stressor on a rare species by blending predictive models with nonparametric permutation tests. We illustrate the approach with two case studies involving rare amphibians in Yosemite National Park, USA. The endangered frog, Rana sierrae, is known to be negatively impacted by non-native fish, while the threatened toad, Anaxyrus canorus, is potentially affected by packstock. Both stressors and amphibians are rare, occurring in ~10% of potential habitat patches. We first predict amphibian occupancy with a statistical model that includes all predictors but the stressor to stratify potential habitat by predicted suitability. A stratified permutation test then evaluates the association between stressor and amphibian, all else equal. Our approach confirms the known negative relationship between fish and R. sierrae, but finds no evidence of a negative relationship between current packstock use and A. canorus breeding. Our statistical approach has potential broad application for deriving understanding (not just prediction) from observational data. PMID:26031755
Pitfalls in statistical landslide susceptibility modelling
NASA Astrophysics Data System (ADS)
Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut
2010-05-01
The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.
Statistical modelling for recurrent events: an application to sports injuries
Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F
2014-01-01
Background Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. Objective This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Methods Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. Results The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Conclusions Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. PMID:22872683
Fault detection and diagnosis using neural network approaches
NASA Technical Reports Server (NTRS)
Kramer, Mark A.
1992-01-01
Neural networks can be used to detect and identify abnormalities in real-time process data. Two basic approaches can be used, the first based on training networks using data representing both normal and abnormal modes of process behavior, and the second based on statistical characterization of the normal mode only. Given data representative of process faults, radial basis function networks can effectively identify failures. This approach is often limited by the lack of fault data, but can be facilitated by process simulation. The second approach employs elliptical and radial basis function neural networks and other models to learn the statistical distributions of process observables under normal conditions. Analytical models of failure modes can then be applied in combination with the neural network models to identify faults. Special methods can be applied to compensate for sensor failures, to produce real-time estimation of missing or failed sensors based on the correlations codified in the neural network.
Vehicle track segmentation using higher order random fields
Quach, Tu -Thach
2017-01-09
Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less
Vehicle track segmentation using higher order random fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quach, Tu -Thach
Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less
Delmotte, Sylvestre; Lopez-Ridaura, Santiago; Barbier, Jean-Marc; Wery, Jacques
2013-11-15
Evaluating the impacts of the development of alternative agricultural systems, such as organic or low-input cropping systems, in the context of an agricultural region requires the use of specific tools and methodologies. They should allow a prospective (using scenarios), multi-scale (taking into account the field, farm and regional level), integrated (notably multicriteria) and participatory assessment, abbreviated PIAAS (for Participatory Integrated Assessment of Agricultural System). In this paper, we compare the possible contribution to PIAAS of three modeling approaches i.e. Bio-Economic Modeling (BEM), Agent-Based Modeling (ABM) and statistical Land-Use/Land Cover Change (LUCC) models. After a presentation of each approach, we analyze their advantages and drawbacks, and identify their possible complementarities for PIAAS. Statistical LUCC modeling is a suitable approach for multi-scale analysis of past changes and can be used to start discussion about the futures with stakeholders. BEM and ABM approaches have complementary features for scenarios assessment at different scales. While ABM has been widely used for participatory assessment, BEM has been rarely used satisfactorily in a participatory manner. On the basis of these results, we propose to combine these three approaches in a framework targeted to PIAAS. Copyright © 2013 Elsevier Ltd. All rights reserved.
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
The Thomas–Fermi quark model: Non-relativistic aspects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Quan, E-mail: quan_liu@baylor.edu; Wilcox, Walter, E-mail: walter_wilcox@baylor.edu
The first numerical investigation of non-relativistic aspects of the Thomas–Fermi (TF) statistical multi-quark model is given. We begin with a review of the traditional TF model without an explicit spin interaction and find that the spin splittings are too small in this approach. An explicit spin interaction is then introduced which entails the definition of a generalized spin “flavor”. We investigate baryonic states in this approach which can be described with two inequivalent wave functions; such states can however apply to multiple degenerate flavors. We find that the model requires a spatial separation of quark flavors, even if completely degenerate.more » Although the TF model is designed to investigate the possibility of many-quark states, we find surprisingly that it may be used to fit the low energy spectrum of almost all ground state octet and decuplet baryons. The charge radii of such states are determined and compared with lattice calculations and other models. The low energy fit obtained allows us to extrapolate to the six-quark doubly strange H-dibaryon state, flavor symmetric strange states of higher quark content and possible six quark nucleon–nucleon resonances. The emphasis here is on the systematics revealed in this approach. We view our model as a versatile and convenient tool for quickly assessing the characteristics of new, possibly bound, particle states of higher quark number content. -- Highlights: • First application of the statistical Thomas–Fermi quark model to baryonic systems. • Novel aspects: spin as generalized flavor; spatial separation of quark flavor phases. • The model is statistical, but the low energy baryonic spectrum is successfully fit. • Numerical applications include the H-dibaryon, strange states and nucleon resonances. • The statistical point of view does not encourage the idea of bound many-quark baryons.« less
In defence of model-based inference in phylogeography
Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent
2017-01-01
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Han, L. F; Plummer, Niel
2016-01-01
Numerous methods have been proposed to estimate the pre-nuclear-detonation 14C content of dissolved inorganic carbon (DIC) recharged to groundwater that has been corrected/adjusted for geochemical processes in the absence of radioactive decay (14C0) - a quantity that is essential for estimation of radiocarbon age of DIC in groundwater. The models/approaches most commonly used are grouped as follows: (1) single-sample-based models, (2) a statistical approach based on the observed (curved) relationship between 14C and δ13C data for the aquifer, and (3) the geochemical mass-balance approach that constructs adjustment models accounting for all the geochemical reactions known to occur along a groundwater flow path. This review discusses first the geochemical processes behind each of the single-sample-based models, followed by discussions of the statistical approach and the geochemical mass-balance approach. Finally, the applications, advantages and limitations of the three groups of models/approaches are discussed.The single-sample-based models constitute the prevailing use of 14C data in hydrogeology and hydrological studies. This is in part because the models are applied to an individual water sample to estimate the 14C age, therefore the measurement data are easily available. These models have been shown to provide realistic radiocarbon ages in many studies. However, they usually are limited to simple carbonate aquifers and selection of model may have significant effects on 14C0 often resulting in a wide range of estimates of 14C ages.Of the single-sample-based models, four are recommended for the estimation of 14C0 of DIC in groundwater: Pearson's model, (Ingerson and Pearson, 1964; Pearson and White, 1967), Han & Plummer's model (Han and Plummer, 2013), the IAEA model (Gonfiantini, 1972; Salem et al., 1980), and Oeschger's model (Geyh, 2000). These four models include all processes considered in single-sample-based models, and can be used in different ranges of 13C values.In contrast to the single-sample-based models, the extended Gonfiantini & Zuppi model (Gonfiantini and Zuppi, 2003; Han et al., 2014) is a statistical approach. This approach can be used to estimate 14C ages when a curved relationship between the 14C and 13C values of the DIC data is observed. In addition to estimation of groundwater ages, the relationship between 14C and δ13C data can be used to interpret hydrogeological characteristics of the aquifer, e.g. estimating apparent rates of geochemical reactions and revealing the complexity of the geochemical environment, and identify samples that are not affected by the same set of reactions/processes as the rest of the dataset. The investigated water samples may have a wide range of ages, and for waters with very low values of 14C, the model based on statistics may give more reliable age estimates than those obtained from single-sample-based models. In the extended Gonfiantini & Zuppi model, a representative system-wide value of the initial 14C content is derived from the 14C and δ13C data of DIC and can differ from that used in single-sample-based models. Therefore, the extended Gonfiantini & Zuppi model usually avoids the effect of modern water components which might retain ‘bomb’ pulse signatures.The geochemical mass-balance approach constructs an adjustment model that accounts for all the geochemical reactions known to occur along an aquifer flow path (Plummer et al., 1983; Wigley et al., 1978; Plummer et al., 1994; Plummer and Glynn, 2013), and includes, in addition to DIC, dissolved organic carbon (DOC) and methane (CH4). If sufficient chemical, mineralogical and isotopic data are available, the geochemical mass-balance method can yield the most accurate estimates of the adjusted radiocarbon age. The main limitation of this approach is that complete information is necessary on chemical, mineralogical and isotopic data and these data are often limited.Failure to recognize the limitations and underlying assumptions on which the various models and approaches are based can result in a wide range of estimates of 14C0 and limit the usefulness of radiocarbon as a dating tool for groundwater. In each of the three generalized approaches (single-sample-based models, statistical approach, and geochemical mass-balance approach), successful application depends on scrutiny of the isotopic (14C and 13C) and chemical data to conceptualize the reactions and processes that affect the 14C content of DIC in aquifers. The recently developed graphical analysis method is shown to aid in determining which approach is most appropriate for the isotopic and chemical data from a groundwater system.
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Nowcasting sunshine number using logistic modeling
NASA Astrophysics Data System (ADS)
Brabec, Marek; Badescu, Viorel; Paulescu, Marius
2013-04-01
In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.
Statistical inference for template aging
NASA Astrophysics Data System (ADS)
Schuckers, Michael E.
2006-04-01
A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Analysis of the dependence of extreme rainfalls
NASA Astrophysics Data System (ADS)
Padoan, Simone; Ancey, Christophe; Parlange, Marc
2010-05-01
The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.
New Methodology for Estimating Fuel Economy by Vehicle Class
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, Shih-Miao; Dabbs, Kathryn; Hwang, Ho-Ling
2011-01-01
Office of Highway Policy Information to develop a new methodology to generate annual estimates of average fuel efficiency and number of motor vehicles registered by vehicle class for Table VM-1 of the Highway Statistics annual publication. This paper describes the new methodology developed under this effort and compares the results of the existing manual method and the new systematic approach. The methodology developed under this study takes a two-step approach. First, the preliminary fuel efficiency rates are estimated based on vehicle stock models for different classes of vehicles. Then, a reconciliation model is used to adjust the initial fuel consumptionmore » rates from the vehicle stock models and match the VMT information for each vehicle class and the reported total fuel consumption. This reconciliation model utilizes a systematic approach that produces documentable and reproducible results. The basic framework utilizes a mathematical programming formulation to minimize the deviations between the fuel economy estimates published in the previous year s Highway Statistics and the results from the vehicle stock models, subject to the constraint that fuel consumptions for different vehicle classes must sum to the total fuel consumption estimate published in Table MF-21 of the current year Highway Statistics. The results generated from this new approach provide a smoother time series for the fuel economies by vehicle class. It also utilizes the most up-to-date and best available data with sound econometric models to generate MPG estimates by vehicle class.« less
2009-02-01
range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The corresponding...frequency range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The...predictions. The averaging process is consistent with the averaging done in statistical energy analysis for stochastic systems. The FEM will always
NASA Astrophysics Data System (ADS)
Most, S.; Nowak, W.; Bijeljic, B.
2014-12-01
Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
Brain tissues volume measurements from 2D MRI using parametric approach
NASA Astrophysics Data System (ADS)
L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.
2018-04-01
The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.
Humidity-corrected Arrhenius equation: The reference condition approach.
Naveršnik, Klemen; Jurečič, Rok
2016-03-16
Accelerated and stress stability data is often used to predict shelf life of pharmaceuticals. Temperature, combined with humidity accelerates chemical decomposition and the Arrhenius equation is used to extrapolate accelerated stability results to long-term stability. Statistical estimation of the humidity-corrected Arrhenius equation is not straightforward due to its non-linearity. A two stage nonlinear fitting approach is used in practice, followed by a prediction stage. We developed a single-stage statistical procedure, called the reference condition approach, which has better statistical properties (less collinearity, direct estimation of uncertainty, narrower prediction interval) and is significantly easier to use, compared to the existing approaches. Our statistical model was populated with data from a 35-day stress stability study on a laboratory batch of vitamin tablets and required mere 30 laboratory assay determinations. The stability prediction agreed well with the actual 24-month long term stability of the product. The approach has high potential to assist product formulation, specification setting and stability statements. Copyright © 2016 Elsevier B.V. All rights reserved.
Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,
2016-01-01
Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.
NASA Astrophysics Data System (ADS)
Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.
2013-04-01
Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
2015-09-30
into acoustic fluctuation calculations. In the Philippine Sea, models of eddies, internal tides, internal waves, and fine structure ( spice ) are...needed, while in the shallow water case a models of the random linear internal waves and spice are lacking. APPROACH The approach to this research is to
Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory
ERIC Educational Resources Information Center
Wells, Craig S.; Bolt, Daniel M.
2008-01-01
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
A statistical approach to optimizing concrete mixture design.
Ahmad, Shamsad; Alghamdi, Saeid A
2014-01-01
A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (3(3)). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m(3)), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options.
A Statistical Approach to Optimizing Concrete Mixture Design
Alghamdi, Saeid A.
2014-01-01
A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (33). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m3), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options. PMID:24688405
Our study assesses the value of both in vitro assay and quantitative structure activity relationship (QSAR) data in predicting in vivo toxicity using numerous statistical models and approaches to process the data. Our models are built on datasets of (i) 586 chemicals for which bo...
ERIC Educational Resources Information Center
Essid, Hedi; Ouellette, Pierre; Vigeant, Stephane
2010-01-01
The objective of this paper is to measure the efficiency of high schools in Tunisia. We use a statistical data envelopment analysis (DEA)-bootstrap approach with quasi-fixed inputs to estimate the precision of our measure. To do so, we developed a statistical model serving as the foundation of the data generation process (DGP). The DGP is…
Discrimination of dynamical system models for biological and chemical processes.
Lorenz, Sönke; Diederichs, Elmar; Telgmann, Regina; Schütte, Christof
2007-06-01
In technical chemistry, systems biology and biotechnology, the construction of predictive models has become an essential step in process design and product optimization. Accurate modelling of the reactions requires detailed knowledge about the processes involved. However, when concerned with the development of new products and production techniques for example, this knowledge often is not available due to the lack of experimental data. Thus, when one has to work with a selection of proposed models, the main tasks of early development is to discriminate these models. In this article, a new statistical approach to model discrimination is described that ranks models wrt. the probability with which they reproduce the given data. The article introduces the new approach, discusses its statistical background, presents numerical techniques for its implementation and illustrates the application to examples from biokinetics.
ERIC Educational Resources Information Center
Miller, John
1994-01-01
Presents an approach to document numbering, document titling, and process measurement which, when used with fundamental techniques of statistical process control, reveals meaningful process-element variation as well as nominal productivity models. (SR)
NASA Astrophysics Data System (ADS)
Crimp, Steven; Jin, Huidong; Kokic, Philip; Bakar, Shuvo; Nicholls, Neville
2018-04-01
Anthropogenic climate change has already been shown to effect the frequency, intensity, spatial extent, duration and seasonality of extreme climate events. Understanding these changes is an important step in determining exposure, vulnerability and focus for adaptation. In an attempt to support adaptation decision-making we have examined statistical modelling techniques to improve the representation of global climate model (GCM) derived projections of minimum temperature extremes (frosts) in Australia. We examine the spatial changes in minimum temperature extreme metrics (e.g. monthly and seasonal frost frequency etc.), for a region exhibiting the strongest station trends in Australia, and compare these changes with minimum temperature extreme metrics derived from 10 GCMs, from the Coupled Model Inter-comparison Project Phase 5 (CMIP 5) datasets, and via statistical downscaling. We compare the observed trends with those derived from the "raw" GCM minimum temperature data as well as examine whether quantile matching (QM) or spatio-temporal (spTimerQM) modelling with Quantile Matching can be used to improve the correlation between observed and simulated extreme minimum temperatures. We demonstrate, that the spTimerQM modelling approach provides correlations with observed daily minimum temperatures for the period August to November of 0.22. This represents an almost fourfold improvement over either the "raw" GCM or QM results. The spTimerQM modelling approach also improves correlations with observed monthly frost frequency statistics to 0.84 as opposed to 0.37 and 0.81 for the "raw" GCM and QM results respectively. We apply the spatio-temporal model to examine future extreme minimum temperature projections for the period 2016 to 2048. The spTimerQM modelling results suggest the persistence of current levels of frost risk out to 2030, with the evidence of continuing decadal variation.
Multiple-Point statistics for stochastic modeling of aquifers, where do we stand?
NASA Astrophysics Data System (ADS)
Renard, P.; Julien, S.
2017-12-01
In the last 20 years, multiple-point statistics have been a focus of much research, successes and disappointments. The aim of this geostatistical approach was to integrate geological information into stochastic models of aquifer heterogeneity to better represent the connectivity of high or low permeability structures in the underground. Many different algorithms (ENESIM, SNESIM, SIMPAT, CCSIM, QUILTING, IMPALA, DEESSE, FILTERSIM, HYPPS, etc.) have been and are still proposed. They are all based on the concept of a training data set from which spatial statistics are derived and used in a further step to generate conditional realizations. Some of these algorithms evaluate the statistics of the spatial patterns for every pixel, other techniques consider the statistics at the scale of a patch or a tile. While the method clearly succeeded in enabling modelers to generate realistic models, several issues are still the topic of debate both from a practical and theoretical point of view, and some issues such as training data set availability are often hindering the application of the method in practical situations. In this talk, the aim is to present a review of the status of these approaches both from a theoretical and practical point of view using several examples at different scales (from pore network to regional aquifer).
Harrison, Jay M; Breeze, Matthew L; Harrigan, George G
2011-08-01
Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.
Li, Longbiao
2016-01-01
In this paper, the fatigue life of fiber-reinforced ceramic-matrix composites (CMCs) with different fiber preforms, i.e., unidirectional, cross-ply, 2D (two dimensional), 2.5D and 3D CMCs at room and elevated temperatures in air and oxidative environments, has been predicted using the micromechanics approach. An effective coefficient of the fiber volume fraction along the loading direction (ECFL) was introduced to describe the fiber architecture of preforms. The statistical matrix multicracking model and fracture mechanics interface debonding criterion were used to determine the matrix crack spacing and interface debonded length. Under cyclic fatigue loading, the fiber broken fraction was determined by combining the interface wear model and fiber statistical failure model at room temperature, and interface/fiber oxidation model, interface wear model and fiber statistical failure model at elevated temperatures, based on the assumption that the fiber strength is subjected to two-parameter Weibull distribution and the load carried by broken and intact fibers satisfies the Global Load Sharing (GLS) criterion. When the broken fiber fraction approaches the critical value, the composites fatigue fracture. PMID:28773332
Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris
2018-05-01
Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.
Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters
2011-01-01
We present an approach to modeling potential climate-driven changes in habitat for tree and bird species in the eastern United States. First, we took an empirical-statistical modeling approach, using randomForest, with species abundance data from national inventories combined with soil, climate, and landscape variables, to build abundance-based habitat models for 134...
Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David
2015-01-01
Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.
A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.
Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L
2014-01-01
We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta
2017-09-19
Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.
Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework.
Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana
2014-06-01
Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd.
Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework†
Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana
2014-01-01
Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd. PMID:25505370
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
Kumar, Ramya; Lahann, Joerg
2016-07-06
The performance of polymer interfaces in biology is governed by a wide spectrum of interfacial properties. With the ultimate goal of identifying design parameters for stem cell culture coatings, we developed a statistical model that describes the dependence of brush properties on surface-initiated polymerization (SIP) parameters. Employing a design of experiments (DOE) approach, we identified operating boundaries within which four gel architecture regimes can be realized, including a new regime of associated brushes in thin films. Our statistical model can accurately predict the brush thickness and the degree of intermolecular association of poly[{2-(methacryloyloxy) ethyl} dimethyl-(3-sulfopropyl) ammonium hydroxide] (PMEDSAH), a previously reported synthetic substrate for feeder-free and xeno-free culture of human embryonic stem cells. DOE-based multifunctional predictions offer a powerful quantitative framework for designing polymer interfaces. For example, model predictions can be used to decrease the critical thickness at which the wettability transition occurs by simply increasing the catalyst quantity from 1 to 3 mol %.
Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A
2011-10-01
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Random walk to a nonergodic equilibrium concept
NASA Astrophysics Data System (ADS)
Bel, G.; Barkai, E.
2006-01-01
Random walk models, such as the trap model, continuous time random walks, and comb models, exhibit weak ergodicity breaking, when the average waiting time is infinite. The open question is, what statistical mechanical theory replaces the canonical Boltzmann-Gibbs theory for such systems? In this paper a nonergodic equilibrium concept is investigated, for a continuous time random walk model in a potential field. In particular we show that in the nonergodic phase the distribution of the occupation time of the particle in a finite region of space approaches U- or W-shaped distributions related to the arcsine law. We show that when conditions of detailed balance are applied, these distributions depend on the partition function of the problem, thus establishing a relation between the nonergodic dynamics and canonical statistical mechanics. In the ergodic phase the distribution function of the occupation times approaches a δ function centered on the value predicted based on standard Boltzmann-Gibbs statistics. The relation of our work to single-molecule experiments is briefly discussed.
Chattoraj, Sayantan; Bhugra, Chandan; Li, Zheng Jane; Sun, Changquan Calvin
2014-12-01
The nonisothermal crystallization kinetics of amorphous materials is routinely analyzed by statistically fitting the crystallization data to kinetic models. In this work, we systematically evaluate how the model-dependent crystallization kinetics is impacted by variations in the heating rate and the selection of the kinetic model, two key factors that can lead to significant differences in the crystallization activation energy (Ea ) of an amorphous material. Using amorphous felodipine, we show that the Ea decreases with increase in the heating rate, irrespective of the kinetic model evaluated in this work. The model that best describes the crystallization phenomenon cannot be identified readily through the statistical fitting approach because several kinetic models yield comparable R(2) . Here, we propose an alternate paired model-fitting model-free (PMFMF) approach for identifying the most suitable kinetic model, where Ea obtained from model-dependent kinetics is compared with those obtained from model-free kinetics. The most suitable kinetic model is identified as the one that yields Ea values comparable with the model-free kinetics. Through this PMFMF approach, nucleation and growth is identified as the main mechanism that controls the crystallization kinetics of felodipine. Using this PMFMF approach, we further demonstrate that crystallization mechanism from amorphous phase varies with heating rate. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
ACCELERATED FAILURE TIME MODELS PROVIDE A USEFUL STATISTICAL FRAMEWORK FOR AGING RESEARCH
Swindell, William R.
2009-01-01
Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model “deceleration factor”. AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data. PMID:19007875
Accelerated failure time models provide a useful statistical framework for aging research.
Swindell, William R
2009-03-01
Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model "deceleration factor". AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data.
Guisan, Antoine; Edwards, T.C.; Hastie, T.
2002-01-01
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.
ERIC Educational Resources Information Center
Hayashi, Atsuhiro
Both the Rule Space Method (RSM) and the Neural Network Model (NNM) are techniques of statistical pattern recognition and classification approaches developed for applications from different fields. RSM was developed in the domain of educational statistics. It started from the use of an incidence matrix Q that characterizes the underlying cognitive…
A statistical approach to estimate O3 uptake of ponderosa pine in a mediterranean climate
N.E. Grulke; H.K. Preisler; C.C. Fan; W.A. Retzlaff
2002-01-01
In highly polluted sites, stomatal behavior is sluggish with respect to light, vapor pressure deficit, and internal CO2 concentration (Ci) and poorly described by existing models. Statistical models were developed to estimate stomatal conductance (gs) of 40-year-old ponderosa pine at three sites differing in pollutant exposure for the purpose of...
A hierarchical fire frequency model to simulate temporal patterns of fire regimes in LANDIS
Jian Yang; Hong S. He; Eric J. Gustafson
2004-01-01
Fire disturbance has important ecological effects in many forest landscapes. Existing statistically based approaches can be used to examine the effects of a fire regime on forest landscape dynamics. Most examples of statistically based fire models divide a fire occurrence into two stages--fire ignition and fire initiation. However, the exponential and Weibull fire-...
NASA Astrophysics Data System (ADS)
Ghotbi, Saba; Sotoudeheian, Saeed; Arhami, Mohammad
2016-09-01
Satellite remote sensing products of AOD from MODIS along with appropriate meteorological parameters were used to develop statistical models and estimate ground-level PM10. Most of previous studies obtained meteorological data from synoptic weather stations, with rather sparse spatial distribution, and used it along with 10 km AOD product to develop statistical models, applicable for PM variations in regional scale (resolution of ≥10 km). In the current study, meteorological parameters were simulated with 3 km resolution using WRF model and used along with the rather new 3 km AOD product (launched in 2014). The resulting PM statistical models were assessed for a polluted and largely variable urban area, Tehran, Iran. Despite the critical particulate pollution problem, very few PM studies were conducted in this area. The issue of rather poor direct PM-AOD associations existed, due to different factors such as variations in particles optical properties, in addition to bright background issue for satellite data, as the studied area located in the semi-arid areas of Middle East. Statistical approach of linear mixed effect (LME) was used, and three types of statistical models including single variable LME model (using AOD as independent variable) and multiple variables LME model by using meteorological data from two sources, WRF model and synoptic stations, were examined. Meteorological simulations were performed using a multiscale approach and creating an appropriate physic for the studied region, and the results showed rather good agreements with recordings of the synoptic stations. The single variable LME model was able to explain about 61%-73% of daily PM10 variations, reflecting a rather acceptable performance. Statistical models performance improved through using multivariable LME and incorporating meteorological data as auxiliary variables, particularly by using fine resolution outputs from WRF (R2 = 0.73-0.81). In addition, rather fine resolution for PM estimates was mapped for the studied city, and resulting concentration maps were consistent with PM recordings at the existing stations.
Mechanics and statistics of the worm-like chain
NASA Astrophysics Data System (ADS)
Marantan, Andrew; Mahadevan, L.
2018-02-01
The worm-like chain model is a simple continuum model for the statistical mechanics of a flexible polymer subject to an external force. We offer a tutorial introduction to it using three approaches. First, we use a mesoscopic view, treating a long polymer (in two dimensions) as though it were made of many groups of correlated links or "clinks," allowing us to calculate its average extension as a function of the external force via scaling arguments. We then provide a standard statistical mechanics approach, obtaining the average extension by two different means: the equipartition theorem and the partition function. Finally, we work in a probabilistic framework, taking advantage of the Gaussian properties of the chain in the large-force limit to improve upon the previous calculations of the average extension.
2017-01-01
Producing predictions of the probabilistic risks of operating materials for given lengths of time at stated operating conditions requires the assimilation of existing deterministic creep life prediction models (that only predict the average failure time) with statistical models that capture the random component of creep. To date, these approaches have rarely been combined to achieve this objective. The first half of this paper therefore provides a summary review of some statistical models to help bridge the gap between these two approaches. The second half of the paper illustrates one possible assimilation using 1Cr1Mo-0.25V steel. The Wilshire equation for creep life prediction is integrated into a discrete hazard based statistical model—the former being chosen because of its novelty and proven capability in accurately predicting average failure times and the latter being chosen because of its flexibility in modelling the failure time distribution. Using this model it was found that, for example, if this material had been in operation for around 15 years at 823 K and 130 MPa, the chances of failure in the next year is around 35%. However, if this material had been in operation for around 25 years, the chance of failure in the next year rises dramatically to around 80%. PMID:29039773
A review of statistical updating methods for clinical prediction models.
Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew
2018-01-01
A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.
Cierniak, Robert; Lorent, Anna
2016-09-01
The main aim of this paper is to investigate properties of our originally formulated statistical model-based iterative approach applied to the image reconstruction from projections problem which are related to its conditioning, and, in this manner, to prove a superiority of this approach over ones recently used by other authors. The reconstruction algorithm based on this conception uses a maximum likelihood estimation with an objective adjusted to the probability distribution of measured signals obtained from an X-ray computed tomography system with parallel beam geometry. The analysis and experimental results presented here show that our analytical approach outperforms the referential algebraic methodology which is explored widely in the literature and exploited in various commercial implementations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Right-Sizing Statistical Models for Longitudinal Data
Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.
2015-01-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Statistical mechanics of simple models of protein folding and design.
Pande, V S; Grosberg, A Y; Tanaka, T
1997-01-01
It is now believed that the primary equilibrium aspects of simple models of protein folding are understood theoretically. However, current theories often resort to rather heavy mathematics to overcome some technical difficulties inherent in the problem or start from a phenomenological model. To this end, we take a new approach in this pedagogical review of the statistical mechanics of protein folding. The benefit of our approach is a drastic mathematical simplification of the theory, without resort to any new approximations or phenomenological prescriptions. Indeed, the results we obtain agree precisely with previous calculations. Because of this simplification, we are able to present here a thorough and self contained treatment of the problem. Topics discussed include the statistical mechanics of the random energy model (REM), tests of the validity of REM as a model for heteropolymer freezing, freezing transition of random sequences, phase diagram of designed ("minimally frustrated") sequences, and the degree to which errors in the interactions employed in simulations of either folding and design can still lead to correct folding behavior. Images FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 6 PMID:9414231
2016-06-01
14 Table 2. Summary of Statistics from GGSS Data ........................................ 35 Table 3. Summary of Statistics from...similar approach are unsurprisingly quite consistent in outcomes within statistical variance. The model is used to estimate the effects of exogenous...of German residents (~82 million), excluding diplomats, foreign military and homeless persons. (German Federal Office of Statistics , 2013, p. 475
A Monte Carlo Approach to Unidimensionality Testing in Polytomous Rasch Models
ERIC Educational Resources Information Center
Christensen, Karl Bang; Kreiner, Svend
2007-01-01
Many statistical tests are designed to test the different assumptions of the Rasch model, but only few are directed at detecting multidimensionality. The Martin-Lof test is an attractive approach, the disadvantage being that its null distribution deviates strongly from the asymptotic chi-square distribution for most realistic sample sizes. A Monte…
A classification procedure for the effective management of changes during the maintenance process
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Basili, Victor R.
1992-01-01
During software operation, maintainers are often faced with numerous change requests. Given available resources such as effort and calendar time, changes, if approved, have to be planned to fit within budget and schedule constraints. In this paper, we address the issue of assessing the difficulty of a change based on known or predictable data. This paper should be considered as a first step towards the construction of customized economic models for maintainers. In it, we propose a modeling approach, based on regular statistical techniques, that can be used in a variety of software maintenance environments. The approach can be easily automated, and is simple for people with limited statistical experience to use. Moreover, it deals effectively with the uncertainty usually associated with both model inputs and outputs. The modeling approach is validated on a data set provided by NASA/GSFC which shows it was effective in classifying changes with respect to the effort involved in implementing them. Other advantages of the approach are discussed along with additional steps to improve the results.
Mridula, Meenu R; Nair, Ashalatha S; Kumar, K Satheesh
2018-02-01
In this paper, we compared the efficacy of observation based modeling approach using a genetic algorithm with the regular statistical analysis as an alternative methodology in plant research. Preliminary experimental data on in vitro rooting was taken for this study with an aim to understand the effect of charcoal and naphthalene acetic acid (NAA) on successful rooting and also to optimize the two variables for maximum result. Observation-based modelling, as well as traditional approach, could identify NAA as a critical factor in rooting of the plantlets under the experimental conditions employed. Symbolic regression analysis using the software deployed here optimised the treatments studied and was successful in identifying the complex non-linear interaction among the variables, with minimalistic preliminary data. The presence of charcoal in the culture medium has a significant impact on root generation by reducing basal callus mass formation. Such an approach is advantageous for establishing in vitro culture protocols as these models will have significant potential for saving time and expenditure in plant tissue culture laboratories, and it further reduces the need for specialised background.
NASA Astrophysics Data System (ADS)
Yan, Wang-Ji; Ren, Wei-Xin
2018-01-01
This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.
An Update on Statistical Boosting in Biomedicine.
Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf
2017-01-01
Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
Hao, Chen; LiJun, Chen; Albright, Thomas P.
2007-01-01
Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.
Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.
Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W
2018-05-18
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Angeler, David G; Viedma, Olga; Moreno, José M
2009-11-01
Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.
Theoretical approaches to the steady-state statistical physics of interacting dissipative units
NASA Astrophysics Data System (ADS)
Bertin, Eric
2017-02-01
The aim of this review is to provide a concise overview of some of the generic approaches that have been developed to deal with the statistical description of large systems of interacting dissipative ‘units’. The latter notion includes, e.g. inelastic grains, active or self-propelled particles, bubbles in a foam, low-dimensional dynamical systems like driven oscillators, or even spatially extended modes like Fourier modes of the velocity field in a fluid. We first review methods based on the statistical properties of a single unit, starting with elementary mean-field approximations, either static or dynamic, that describe a unit embedded in a ‘self-consistent’ environment. We then discuss how this basic mean-field approach can be extended to account for spatial dependences, in the form of space-dependent mean-field Fokker-Planck equations, for example. We also briefly review the use of kinetic theory in the framework of the Boltzmann equation, which is an appropriate description for dilute systems. We then turn to descriptions in terms of the full N-body distribution, starting from exact solutions of one-dimensional models, using a matrix-product ansatz method when correlations are present. Since exactly solvable models are scarce, we also present some approximation methods which can be used to determine the N-body distribution in a large system of dissipative units. These methods include the Edwards approach for dense granular matter and the approximate treatment of multiparticle Langevin equations with colored noise, which models systems of self-propelled particles. Throughout this review, emphasis is put on methodological aspects of the statistical modeling and on formal similarities between different physical problems, rather than on the specific behavior of a given system.
NASA Astrophysics Data System (ADS)
Xu, Lei; Chen, Nengcheng; Zhang, Xiang
2018-02-01
Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.
Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren
2011-05-01
The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.
2012-01-01
Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials. PMID:23056232
Walsh, Daniel P.; Norton, Andrew S.; Storm, Daniel J.; Van Deelen, Timothy R.; Heisy, Dennis M.
2018-01-01
Implicit and explicit use of expert knowledge to inform ecological analyses is becoming increasingly common because it often represents the sole source of information in many circumstances. Thus, there is a need to develop statistical methods that explicitly incorporate expert knowledge, and can successfully leverage this information while properly accounting for associated uncertainty during analysis. Studies of cause-specific mortality provide an example of implicit use of expert knowledge when causes-of-death are uncertain and assigned based on the observer's knowledge of the most likely cause. To explicitly incorporate this use of expert knowledge and the associated uncertainty, we developed a statistical model for estimating cause-specific mortality using a data augmentation approach within a Bayesian hierarchical framework. Specifically, for each mortality event, we elicited the observer's belief of cause-of-death by having them specify the probability that the death was due to each potential cause. These probabilities were then used as prior predictive values within our framework. This hierarchical framework permitted a simple and rigorous estimation method that was easily modified to include covariate effects and regularizing terms. Although applied to survival analysis, this method can be extended to any event-time analysis with multiple event types, for which there is uncertainty regarding the true outcome. We conducted simulations to determine how our framework compared to traditional approaches that use expert knowledge implicitly and assume that cause-of-death is specified accurately. Simulation results supported the inclusion of observer uncertainty in cause-of-death assignment in modeling of cause-specific mortality to improve model performance and inference. Finally, we applied the statistical model we developed and a traditional method to cause-specific survival data for white-tailed deer, and compared results. We demonstrate that model selection results changed between the two approaches, and incorporating observer knowledge in cause-of-death increased the variability associated with parameter estimates when compared to the traditional approach. These differences between the two approaches can impact reported results, and therefore, it is critical to explicitly incorporate expert knowledge in statistical methods to ensure rigorous inference.
Scharfenberger, Christian; Wong, Alexander; Clausi, David A
2015-01-01
We propose a simple yet effective structure-guided statistical textural distinctiveness approach to salient region detection. Our method uses a multilayer approach to analyze the structural and textural characteristics of natural images as important features for salient region detection from a scale point of view. To represent the structural characteristics, we abstract the image using structured image elements and extract rotational-invariant neighborhood-based textural representations to characterize each element by an individual texture pattern. We then learn a set of representative texture atoms for sparse texture modeling and construct a statistical textural distinctiveness matrix to determine the distinctiveness between all representative texture atom pairs in each layer. Finally, we determine saliency maps for each layer based on the occurrence probability of the texture atoms and their respective statistical textural distinctiveness and fuse them to compute a final saliency map. Experimental results using four public data sets and a variety of performance evaluation metrics show that our approach provides promising results when compared with existing salient region detection approaches.
Reduction of chemical reaction models
NASA Technical Reports Server (NTRS)
Frenklach, Michael
1991-01-01
An attempt is made to reconcile the different terminologies pertaining to reduction of chemical reaction models. The approaches considered include global modeling, response modeling, detailed reduction, chemical lumping, and statistical lumping. The advantages and drawbacks of each of these methods are pointed out.
Karim, Mohammad Ehsanul; Platt, Robert W
2017-06-15
Correct specification of the inverse probability weighting (IPW) model is necessary for consistent inference from a marginal structural Cox model (MSCM). In practical applications, researchers are typically unaware of the true specification of the weight model. Nonetheless, IPWs are commonly estimated using parametric models, such as the main-effects logistic regression model. In practice, assumptions underlying such models may not hold and data-adaptive statistical learning methods may provide an alternative. Many candidate statistical learning approaches are available in the literature. However, the optimal approach for a given dataset is impossible to predict. Super learner (SL) has been proposed as a tool for selecting an optimal learner from a set of candidates using cross-validation. In this study, we evaluate the usefulness of a SL in estimating IPW in four different MSCM simulation scenarios, in which we varied the specification of the true weight model specification (linear and/or additive). Our simulations show that, in the presence of weight model misspecification, with a rich and diverse set of candidate algorithms, SL can generally offer a better alternative to the commonly used statistical learning approaches in terms of MSE as well as the coverage probabilities of the estimated effect in an MSCM. The findings from the simulation studies guided the application of the MSCM in a multiple sclerosis cohort from British Columbia, Canada (1995-2008), to estimate the impact of beta-interferon treatment in delaying disability progression. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Accounting for standard errors of vision-specific latent trait in regression models.
Wong, Wan Ling; Li, Xiang; Li, Jialiang; Wong, Tien Yin; Cheng, Ching-Yu; Lamoureux, Ecosse L
2014-07-11
To demonstrate the effectiveness of Hierarchical Bayesian (HB) approach in a modeling framework for association effects that accounts for SEs of vision-specific latent traits assessed using Rasch analysis. A systematic literature review was conducted in four major ophthalmic journals to evaluate Rasch analysis performed on vision-specific instruments. The HB approach was used to synthesize the Rasch model and multiple linear regression model for the assessment of the association effects related to vision-specific latent traits. The effectiveness of this novel HB one-stage "joint-analysis" approach allows all model parameters to be estimated simultaneously and was compared with the frequently used two-stage "separate-analysis" approach in our simulation study (Rasch analysis followed by traditional statistical analyses without adjustment for SE of latent trait). Sixty-six reviewed articles performed evaluation and validation of vision-specific instruments using Rasch analysis, and 86.4% (n = 57) performed further statistical analyses on the Rasch-scaled data using traditional statistical methods; none took into consideration SEs of the estimated Rasch-scaled scores. The two models on real data differed for effect size estimations and the identification of "independent risk factors." Simulation results showed that our proposed HB one-stage "joint-analysis" approach produces greater accuracy (average of 5-fold decrease in bias) with comparable power and precision in estimation of associations when compared with the frequently used two-stage "separate-analysis" procedure despite accounting for greater uncertainty due to the latent trait. Patient-reported data, using Rasch analysis techniques, do not take into account the SE of latent trait in association analyses. The HB one-stage "joint-analysis" is a better approach, producing accurate effect size estimations and information about the independent association of exposure variables with vision-specific latent traits. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Categorical data processing for real estate objects valuation using statistical analysis
NASA Astrophysics Data System (ADS)
Parygin, D. S.; Malikov, V. P.; Golubev, A. V.; Sadovnikova, N. P.; Petrova, T. M.; Finogeev, A. G.
2018-05-01
Theoretical and practical approaches to the use of statistical methods for studying various properties of infrastructure objects are analyzed in the paper. Methods of forecasting the value of objects are considered. A method for coding categorical variables describing properties of real estate objects is proposed. The analysis of the results of modeling the price of real estate objects using regression analysis and an algorithm based on a comparative approach is carried out.
Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon
2018-04-01
Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.
Cost model validation: a technical and cultural approach
NASA Technical Reports Server (NTRS)
Hihn, J.; Rosenberg, L.; Roust, K.; Warfield, K.
2001-01-01
This paper summarizes how JPL's parametric mission cost model (PMCM) has been validated using both formal statistical methods and a variety of peer and management reviews in order to establish organizational acceptance of the cost model estimates.
The Role of Probability in Developing Learners' Models of Simulation Approaches to Inference
ERIC Educational Resources Information Center
Lee, Hollylynne S.; Doerr, Helen M.; Tran, Dung; Lovett, Jennifer N.
2016-01-01
Repeated sampling approaches to inference that rely on simulations have recently gained prominence in statistics education, and probabilistic concepts are at the core of this approach. In this approach, learners need to develop a mapping among the problem situation, a physical enactment, computer representations, and the underlying randomization…
ERIC Educational Resources Information Center
Tay, Louis; Drasgow, Fritz
2012-01-01
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D
2007-02-01
Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.
Wang, Shengqiang; Xiao, Cong; Ishizaka, Joji; Qiu, Zhongfeng; Sun, Deyong; Xu, Qian; Zhu, Yuanli; Huan, Yu; Watanabe, Yuji
2016-10-17
Knowledge of phytoplankton community structures is important to the understanding of various marine biogeochemical processes and ecosystem. Fluorescence excitation spectra (F(λ)) provide great potential for studying phytoplankton communities because their spectral variability depends on changes in the pigment compositions related to distinct phytoplankton groups. Commercial spectrofluorometers have been developed to analyze phytoplankton communities by measuring the field F(λ), but estimations using the default methods are not always accurate because of their strong dependence on norm spectra, which are obtained by culturing pure algae of a given group and are assumed to be constant. In this study, we proposed a novel approach for estimating the chlorophyll a (Chl a) fractions of brown algae, cyanobacteria, green algae and cryptophytes based on a data set collected in the East China Sea (ECS) and the Tsushima Strait (TS), with concurrent measurements of in vivo F(λ) and phytoplankton communities derived from pigments analysis. The new approach blends various statistical features by computing the band ratios and continuum-removed spectra of F(λ) without requiring a priori knowledge of the norm spectra. The model evaluations indicate that our approach yields good estimations of the Chl a fractions, with root-mean-square errors of 0.117, 0.078, 0.072 and 0.060 for brown algae, cyanobacteria, green algae and cryptophytes, respectively. The statistical analysis shows that the models are generally robust to uncertainty in F(λ). We recommend using a site-specific model for more accurate estimations. To develop a site-specific model in the ECS and TS, approximately 26 samples are sufficient for using our approach, but this conclusion needs to be validated in additional regions. Overall, our approach provides a useful technical basis for estimating phytoplankton communities from measurements of F(λ).
Mediation Analysis with Survival Outcomes: Accelerated Failure Time vs. Proportional Hazards Models.
Gelfand, Lois A; MacKinnon, David P; DeRubeis, Robert J; Baraldi, Amanda N
2016-01-01
Survival time is an important type of outcome variable in treatment research. Currently, limited guidance is available regarding performing mediation analyses with survival outcomes, which generally do not have normally distributed errors, and contain unobserved (censored) events. We present considerations for choosing an approach, using a comparison of semi-parametric proportional hazards (PH) and fully parametric accelerated failure time (AFT) approaches for illustration. We compare PH and AFT models and procedures in their integration into mediation models and review their ability to produce coefficients that estimate causal effects. Using simulation studies modeling Weibull-distributed survival times, we compare statistical properties of mediation analyses incorporating PH and AFT approaches (employing SAS procedures PHREG and LIFEREG, respectively) under varied data conditions, some including censoring. A simulated data set illustrates the findings. AFT models integrate more easily than PH models into mediation models. Furthermore, mediation analyses incorporating LIFEREG produce coefficients that can estimate causal effects, and demonstrate superior statistical properties. Censoring introduces bias in the coefficient estimate representing the treatment effect on outcome-underestimation in LIFEREG, and overestimation in PHREG. With LIFEREG, this bias can be addressed using an alternative estimate obtained from combining other coefficients, whereas this is not possible with PHREG. When Weibull assumptions are not violated, there are compelling advantages to using LIFEREG over PHREG for mediation analyses involving survival-time outcomes. Irrespective of the procedures used, the interpretation of coefficients, effects of censoring on coefficient estimates, and statistical properties should be taken into account when reporting results.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes, These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
An adaptive state of charge estimation approach for lithium-ion series-connected battery system
NASA Astrophysics Data System (ADS)
Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael
2018-07-01
Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.
Modelling short time series in metabolomics: a functional data analysis approach.
Montana, Giovanni; Berk, Maurice; Ebbels, Tim
2011-01-01
Metabolomics is the study of the complement of small molecule metabolites in cells, biofluids and tissues. Many metabolomic experiments are designed to compare changes observed over time under two or more experimental conditions (e.g. a control and drug-treated group), thus producing time course data. Models from traditional time series analysis are often unsuitable because, by design, only very few time points are available and there are a high number of missing values. We propose a functional data analysis approach for modelling short time series arising in metabolomic studies which overcomes these obstacles. Our model assumes that each observed time series is a smooth random curve, and we propose a statistical approach for inferring this curve from repeated measurements taken on the experimental units. A test statistic for detecting differences between temporal profiles associated with two experimental conditions is then presented. The methodology has been applied to NMR spectroscopy data collected in a pre-clinical toxicology study.
Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C.
2008-01-01
Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense SNPs in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches: the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey’s 1-df model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women’s Health Initiative (WHI), this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with BMI. PMID:18615621
Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C
2009-01-01
Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense single nucleotype polymorphisms (SNPs) in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches, the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey's one-degree-of-freedom model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women's Health Initiative, this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with body mass index.
More powerful haplotype sharing by accounting for the mode of inheritance.
Ziegler, Andreas; Ewhida, Adel; Brendel, Michael; Kleensang, André
2009-04-01
The concept of haplotype sharing (HS) has received considerable attention recently, and several haplotype association methods have been proposed. Here, we extend the work of Beckmann and colleagues [2005 Hum. Hered. 59:67-78] who derived an HS statistic (BHS) as special case of Mantel's space-time clustering approach. The Mantel-type HS statistic correlates genetic similarity with phenotypic similarity across pairs of individuals. While phenotypic similarity is measured as the mean-corrected cross product of phenotypes, we propose to incorporate information of the underlying genetic model in the measurement of the genetic similarity. Specifically, for the recessive and dominant modes of inheritance we suggest the use of the minimum and maximum of shared length of haplotypes around a marker locus for pairs of individuals. If the underlying genetic model is unknown, we propose a model-free HS Mantel statistic using the max-test approach. We compare our novel HS statistics to BHS using simulated case-control data and illustrate its use by re-analyzing data from a candidate region of chromosome 18q from the Rheumatoid Arthritis (RA) Consortium. We demonstrate that our approach is point-wise valid and superior to BHS. In the re-analysis of the RA data, we identified three regions with point-wise P-values<0.005 containing six known genes (PMIP1, MC4R, PIGN, KIAA1468, TNFRSF11A and ZCCHC2) which might be worth follow-up.
Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer
2008-01-01
Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems
NASA Technical Reports Server (NTRS)
He, Yuning; Davies, Misty Dawn
2014-01-01
The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.
Toward statistical modeling of saccadic eye-movement and visual saliency.
Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming
2014-11-01
In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Condensate statistics and thermodynamics of weakly interacting Bose gas: Recursion relation approach
NASA Astrophysics Data System (ADS)
Dorfman, K. E.; Kim, M.; Svidzinsky, A. A.
2011-03-01
We study condensate statistics and thermodynamics of weakly interacting Bose gas with a fixed total number N of particles in a cubic box. We find the exact recursion relation for the canonical ensemble partition function. Using this relation, we calculate the distribution function of condensate particles for N=200. We also calculate the distribution function based on multinomial expansion of the characteristic function. Similar to the ideal gas, both approaches give exact statistical moments for all temperatures in the framework of Bogoliubov model. We compare them with the results of unconstraint canonical ensemble quasiparticle formalism and the hybrid master equation approach. The present recursion relation can be used for any external potential and boundary conditions. We investigate the temperature dependence of the first few statistical moments of condensate fluctuations as well as thermodynamic potentials and heat capacity analytically and numerically in the whole temperature range.
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains
NASA Astrophysics Data System (ADS)
Cofré, Rodrigo; Maldonado, Cesar
2018-01-01
We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is fundamentally different than other approaches to data fusion for remote sensing data because it is inferential rather than merely descriptive. All approaches combine data in a way that minimizes some specified loss function. Most of these are more or less ad hoc criteria based on what looks good to the eye, or some criteria that relate only to the data at hand.
Validating the simulation of large-scale parallel applications using statistical characteristics
Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...
2016-03-01
Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Statistical analysis of weigh-in-motion data for bridge design in Vermont.
DOT National Transportation Integrated Search
2014-10-01
This study investigates the suitability of the HL-93 live load model recommended by AASHTO LRFD Specifications : for its use in the analysis and design of bridges in Vermont. The method of approach consists in performing a : statistical analysis of w...
Alternative Statistical Frameworks for Student Growth Percentile Estimation
ERIC Educational Resources Information Center
Lockwood, J. R.; Castellano, Katherine E.
2015-01-01
This article suggests two alternative statistical approaches for estimating student growth percentiles (SGP). The first is to estimate percentile ranks of current test scores conditional on past test scores directly, by modeling the conditional cumulative distribution functions, rather than indirectly through quantile regressions. This would…
Novick, Steven; Shen, Yan; Yang, Harry; Peterson, John; LeBlond, Dave; Altan, Stan
2015-01-01
Dissolution (or in vitro release) studies constitute an important aspect of pharmaceutical drug development. One important use of such studies is for justifying a biowaiver for post-approval changes which requires establishing equivalence between the new and old product. We propose a statistically rigorous modeling approach for this purpose based on the estimation of what we refer to as the F2 parameter, an extension of the commonly used f2 statistic. A Bayesian test procedure is proposed in relation to a set of composite hypotheses that capture the similarity requirement on the absolute mean differences between test and reference dissolution profiles. Several examples are provided to illustrate the application. Results of our simulation study comparing the performance of f2 and the proposed method show that our Bayesian approach is comparable to or in many cases superior to the f2 statistic as a decision rule. Further useful extensions of the method, such as the use of continuous-time dissolution modeling, are considered.
Computationally efficient statistical differential equation modeling using homogenization
Hooten, Mevin B.; Garlick, Martha J.; Powell, James A.
2013-01-01
Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.
Attempting to physically explain space-time correlation of extremes
NASA Astrophysics Data System (ADS)
Bernardara, Pietro; Gailhard, Joel
2010-05-01
Spatial and temporal clustering of hydro-meteorological extreme events is scientific evidence. Moreover, the statistical parameters characterizing their local frequencies of occurrence show clear spatial patterns. Thus, in order to robustly assess the hydro-meteorological hazard, statistical models need to be able to take into account spatial and temporal dependencies. Statistical models considering long term correlation for quantifying and qualifying temporal and spatial dependencies are available, such as multifractal approach. Furthermore, the development of regional frequency analysis techniques allows estimating the frequency of occurrence of extreme events taking into account spatial patterns on the extreme quantiles behaviour. However, in order to understand the origin of spatio-temporal clustering, an attempt to find physical explanation should be done. Here, some statistical evidences of spatio-temporal correlation and spatial patterns of extreme behaviour are given on a large database of more than 400 rainfall and discharge series in France. In particular, the spatial distribution of multifractal and Generalized Pareto distribution parameters shows evident correlation patterns in the behaviour of frequency of occurrence of extremes. It is then shown that the identification of atmospheric circulation pattern (weather types) can physically explain the temporal clustering of extreme rainfall events (seasonality) and the spatial pattern of the frequency of occurrence. Moreover, coupling this information with the hydrological modelization of a watershed (as in the Schadex approach) an explanation of spatio-temporal distribution of extreme discharge can also be provided. We finally show that a hydro-meteorological approach (as the Schadex approach) can explain and take into account space and time dependencies of hydro-meteorological extreme events.
NASA Astrophysics Data System (ADS)
Zack, J. W.
2015-12-01
Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.
Zhang, Hao; Niu, Yanxiong; Lu, Jiazhen; Zhang, He
2016-11-20
Angular velocity information is a requisite for a spacecraft guidance, navigation, and control system. In this paper, an approach for angular velocity estimation based merely on star vector measurement with an improved current statistical model Kalman filter is proposed. High-precision angular velocity estimation can be achieved under dynamic conditions. The amount of calculation is also reduced compared to a Kalman filter. Different trajectories are simulated to test this approach, and experiments with real starry sky observation are implemented for further confirmation. The estimation accuracy is proved to be better than 10-4 rad/s under various conditions. Both the simulation and the experiment demonstrate that the described approach is effective and shows an excellent performance under both static and dynamic conditions.
PyEvolve: a toolkit for statistical modelling of molecular evolution.
Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J; Isaev, Alexander; Huttley, Gavin A
2004-01-05
Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software.
Assimilating the Future for Better Forecasts and Earlier Warnings
NASA Astrophysics Data System (ADS)
Du, H.; Wheatcroft, E.; Smith, L. A.
2016-12-01
Multi-model ensembles have become popular tools to account for some of the uncertainty due to model inadequacy in weather and climate simulation-based predictions. The current multi-model forecasts focus on combining single model ensemble forecasts by means of statistical post-processing. Assuming each model is developed independently or with different primary target variables, each is likely to contain different dynamical strengths and weaknesses. Using statistical post-processing, such information is only carried by the simulations under a single model ensemble: no advantage is taken to influence simulations under the other models. A novel methodology, named Multi-model Cross Pollination in Time, is proposed for multi-model ensemble scheme with the aim of integrating the dynamical information regarding the future from each individual model operationally. The proposed approach generates model states in time via applying data assimilation scheme(s) to yield truly "multi-model trajectories". It is demonstrated to outperform traditional statistical post-processing in the 40-dimensional Lorenz96 flow. Data assimilation approaches are originally designed to improve state estimation from the past to the current time. The aim of this talk is to introduce a framework that uses data assimilation to improve model forecasts at future time (not to argue for any one particular data assimilation scheme). Illustration of applying data assimilation "in the future" to provide early warning of future high-impact events is also presented.
Huttary, Rudolf; Goubergrits, Leonid; Schütte, Christof; Bernhard, Stefan
2017-08-01
It has not yet been possible to obtain modeling approaches suitable for covering a wide range of real world scenarios in cardiovascular physiology because many of the system parameters are uncertain or even unknown. Natural variability and statistical variation of cardiovascular system parameters in healthy and diseased conditions are characteristic features for understanding cardiovascular diseases in more detail. This paper presents SISCA, a novel software framework for cardiovascular system modeling and its MATLAB implementation. The framework defines a multi-model statistical ensemble approach for dimension reduced, multi-compartment models and focuses on statistical variation, system identification and patient-specific simulation based on clinical data. We also discuss a data-driven modeling scenario as a use case example. The regarded dataset originated from routine clinical examinations and comprised typical pre and post surgery clinical data from a patient diagnosed with coarctation of aorta. We conducted patient and disease specific pre/post surgery modeling by adapting a validated nominal multi-compartment model with respect to structure and parametrization using metadata and MRI geometry. In both models, the simulation reproduced measured pressures and flows fairly well with respect to stenosis and stent treatment and by pre-treatment cross stenosis phase shift of the pulse wave. However, with post-treatment data showing unrealistic phase shifts and other more obvious inconsistencies within the dataset, the methods and results we present suggest that conditioning and uncertainty management of routine clinical data sets needs significantly more attention to obtain reasonable results in patient-specific cardiovascular modeling. Copyright © 2017 Elsevier Ltd. All rights reserved.
Addressing the statistical mechanics of planet orbits in the solar system
NASA Astrophysics Data System (ADS)
Mogavero, Federico
2017-10-01
The chaotic nature of planet dynamics in the solar system suggests the relevance of a statistical approach to planetary orbits. In such a statistical description, the time-dependent position and velocity of the planets are replaced by the probability density function (PDF) of their orbital elements. It is natural to set up this kind of approach in the framework of statistical mechanics. In the present paper, I focus on the collisionless excitation of eccentricities and inclinations via gravitational interactions in a planetary system. The future planet trajectories in the solar system constitute the prototype of this kind of dynamics. I thus address the statistical mechanics of the solar system planet orbits and try to reproduce the PDFs numerically constructed by Laskar (2008, Icarus, 196, 1). I show that the microcanonical ensemble of the Laplace-Lagrange theory accurately reproduces the statistics of the giant planet orbits. To model the inner planets I then investigate the ansatz of equiprobability in the phase space constrained by the secular integrals of motion. The eccentricity and inclination PDFs of Earth and Venus are reproduced with no free parameters. Within the limitations of a stationary model, the predictions also show a reasonable agreement with Mars PDFs and that of Mercury inclination. The eccentricity of Mercury demands in contrast a deeper analysis. I finally revisit the random walk approach of Laskar to the time dependence of the inner planet PDFs. Such a statistical theory could be combined with direct numerical simulations of planet trajectories in the context of planet formation, which is likely to be a chaotic process.
Improved Statistics for Genome-Wide Interaction Analysis
Ueki, Masao; Cordell, Heather J.
2012-01-01
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670
Variable system: An alternative approach for the analysis of mediated moderation.
Kwan, Joyce Lok Yin; Chan, Wai
2018-06-01
Mediated moderation (meMO) occurs when the moderation effect of the moderator (W) on the relationship between the independent variable (X) and the dependent variable (Y) is transmitted through a mediator (M). To examine this process empirically, 2 different model specifications (Type I meMO and Type II meMO) have been proposed in the literature. However, both specifications are found to be problematic, either conceptually or statistically. For example, it can be shown that each type of meMO model is statistically equivalent to a particular form of moderated mediation (moME), another process that examines the condition when the indirect effect from X to Y through M varies as a function of W. Consequently, it is difficult for one to differentiate these 2 processes mathematically. This study therefore has 2 objectives. First, we attempt to differentiate moME and meMO by proposing an alternative specification for meMO. Conceptually, this alternative specification is intuitively meaningful and interpretable, and, statistically, it offers meMO a unique representation that is no longer identical to its moME counterpart. Second, using structural equation modeling, we propose an integrated approach for the analysis of meMO as well as for other general types of conditional path models. VS, a computer software program that implements the proposed approach, has been developed to facilitate the analysis of conditional path models for applied researchers. Real examples are considered to illustrate how the proposed approach works in practice and to compare its performance against the traditional methods. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.
Dorie, Vincent; Harada, Masataka; Carnegie, Nicole Bohme; Hill, Jennifer
2016-09-10
When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics
NASA Astrophysics Data System (ADS)
Lazarus, S. M.; Holman, B. P.; Splitt, M. E.
2017-12-01
A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.
The development of ensemble theory. A new glimpse at the history of statistical mechanics
NASA Astrophysics Data System (ADS)
Inaba, Hajime
2015-12-01
This paper investigates the history of statistical mechanics from the viewpoint of the development of the ensemble theory from 1871 to 1902. In 1871, Ludwig Boltzmann introduced a prototype model of an ensemble that represents a polyatomic gas. In 1879, James Clerk Maxwell defined an ensemble as copies of systems of the same energy. Inspired by H.W. Watson, he called his approach "statistical". Boltzmann and Maxwell regarded the ensemble theory as a much more general approach than the kinetic theory. In the 1880s, influenced by Hermann von Helmholtz, Boltzmann made use of ensembles to establish thermodynamic relations. In Elementary Principles in Statistical Mechanics of 1902, Josiah Willard Gibbs tried to get his ensemble theory to mirror thermodynamics, including thermodynamic operations in its scope. Thermodynamics played the role of a "blind guide". His theory of ensembles can be characterized as more mathematically oriented than Einstein's theory proposed in the same year. Mechanical, empirical, and statistical approaches to foundations of statistical mechanics are presented. Although it was formulated in classical terms, the ensemble theory provided an infrastructure still valuable in quantum statistics because of its generality.
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-04-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Attitude determination using an adaptive multiple model filtering Scheme
NASA Technical Reports Server (NTRS)
Lam, Quang; Ray, Surendra N.
1995-01-01
Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.
Attitude determination using an adaptive multiple model filtering Scheme
NASA Astrophysics Data System (ADS)
Lam, Quang; Ray, Surendra N.
1995-05-01
Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.
Prediction of the dollar to the ruble rate. A system-theoretic approach
NASA Astrophysics Data System (ADS)
Borodachev, Sergey M.
2017-07-01
Proposed a simple state-space model of dollar rate formation based on changes in oil prices and some mechanisms of money transfer between monetary and stock markets. Comparison of predictions by means of input-output model and state-space model is made. It concludes that with proper use of statistical data (Kalman filter) the second approach provides more adequate predictions of the dollar rate.
Butler Ellis, M Clare; Kennedy, Marc C; Kuster, Christian J; Alanis, Rafael; Tuck, Clive R
2018-05-28
The BREAM (Bystander and Resident Exposure Assessment Model) (Kennedy et al. in BREAM: A probabilistic bystander and resident exposure assessment model of spray drift from an agricultural boom sprayer. Comput Electron Agric 2012;88:63-71) for bystander and resident exposure to spray drift from boom sprayers has recently been incorporated into the European Food Safety Authority (EFSA) guidance for determining non-dietary exposures of humans to plant protection products. The component of BREAM, which relates airborne spray concentrations to bystander and resident dermal exposure, has been reviewed to identify whether it is possible to improve this and its description of variability captured in the model. Two approaches have been explored: a more rigorous statistical analysis of the empirical data and a semi-mechanistic model based on established studies combined with new data obtained in a wind tunnel. A statistical comparison between field data and model outputs was used to determine which approach gave the better prediction of exposures. The semi-mechanistic approach gave the better prediction of experimental data and resulted in a reduction in the proposed regulatory values for the 75th and 95th percentiles of the exposure distribution.
A Hybrid Multi-Scale Model of Crystal Plasticity for Handling Stress Concentrations
Sun, Shang; Ramazani, Ali; Sundararaghavan, Veera
2017-09-04
Microstructural effects become important at regions of stress concentrators such as notches, cracks and contact surfaces. A multiscale model is presented that efficiently captures microstructural details at such critical regions. The approach is based on a multiresolution mesh that includes an explicit microstructure representation at critical regions where stresses are localized. At regions farther away from the stress concentration, a reduced order model that statistically captures the effect of the microstructure is employed. The statistical model is based on a finite element representation of the orientation distribution function (ODF). As an illustrative example, we have applied the multiscaling method tomore » compute the stress intensity factor K I around the crack tip in a wedge-opening load specimen. The approach is verified with an analytical solution within linear elasticity approximation and is then extended to allow modeling of microstructural effects on crack tip plasticity.« less
A Hybrid Multi-Scale Model of Crystal Plasticity for Handling Stress Concentrations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Shang; Ramazani, Ali; Sundararaghavan, Veera
Microstructural effects become important at regions of stress concentrators such as notches, cracks and contact surfaces. A multiscale model is presented that efficiently captures microstructural details at such critical regions. The approach is based on a multiresolution mesh that includes an explicit microstructure representation at critical regions where stresses are localized. At regions farther away from the stress concentration, a reduced order model that statistically captures the effect of the microstructure is employed. The statistical model is based on a finite element representation of the orientation distribution function (ODF). As an illustrative example, we have applied the multiscaling method tomore » compute the stress intensity factor K I around the crack tip in a wedge-opening load specimen. The approach is verified with an analytical solution within linear elasticity approximation and is then extended to allow modeling of microstructural effects on crack tip plasticity.« less
Power analysis on the time effect for the longitudinal Rasch model.
Feddag, M L; Blanchin, M; Hardouin, J B; Sebille, V
2014-01-01
Statistics literature in the social, behavioral, and biomedical sciences typically stress the importance of power analysis. Patient Reported Outcomes (PRO) such as quality of life and other perceived health measures (pain, fatigue, stress,...) are increasingly used as important health outcomes in clinical trials or in epidemiological studies. They cannot be directly observed nor measured as other clinical or biological data and they are often collected through questionnaires with binary or polytomous items. The Rasch model is the well known model in the item response theory (IRT) for binary data. The article proposes an approach to evaluate the statistical power of the time effect for the longitudinal Rasch model with two time points. The performance of this method is compared to the one obtained by simulation study. Finally, the proposed approach is illustrated on one subscale of the SF-36 questionnaire.
NASA Astrophysics Data System (ADS)
Ashe, E.; Kopp, R. E.; Khan, N.; Horton, B.; Engelhart, S. E.
2016-12-01
Sea level varies over of both space and time. Prior to the instrumental period, the sea-level record depends upon geological reconstructions that contain vertical and temporal uncertainty. Spatio-temporal statistical models enable the interpretation of RSL and rates of change as well as the reconstruction of the entire sea-level field from such noisy data. Hierarchical models explicitly distinguish between a process level, which characterizes the spatio-temporal field, and a data level, by which sparse proxy data and its noise is recorded. A hyperparameter level depicts prior expectations about the structure of variability in the spatio-temporal field. Spatio-temporal hierarchical models are amenable to several analysis approaches, with tradeoffs regarding computational efficiency and comprehensiveness of uncertainty characterization. A fully-Bayesian hierarchical model (BHM), which places prior probability distributions upon the hyperparameters, is more computationally intensive than an empirical hierarchical model (EHM), which uses point estimates of hyperparameters, derived from the data [1]. Here, we assess the sensitivity of posterior estimates of relative sea level (RSL) and rates to different statistical approaches by varying prior assumptions about the spatial and temporal structure of sea-level variability and applying multiple analytical approaches to Holocene sea-level proxies along the Atlantic coast of North American and the Caribbean [2]. References: 1. N Cressie, Wikle CK (2011) Statistics for spatio-temporal data (John Wiley & Sons). 2. Kahn N et al. (2016). Quaternary Science Reviews (in revision).
Federal and state agencies responsible for protecting water quality rely mainly on statistically-based methods to assess and manage risks to the nation's streams, lakes and estuaries. Although statistical approaches provide valuable information on current trends in water quality...
A BAYESIAN STATISTICAL APPROACH FOR THE EVALUATION OF CMAQ
Bayesian statistical methods are used to evaluate Community Multiscale Air Quality (CMAQ) model simulations of sulfate aerosol over a section of the eastern US for 4-week periods in summer and winter 2001. The observed data come from two U.S. Environmental Protection Agency data ...
Using Multilevel Modeling in Language Assessment Research: A Conceptual Introduction
ERIC Educational Resources Information Center
Barkaoui, Khaled
2013-01-01
This article critiques traditional single-level statistical approaches (e.g., multiple regression analysis) to examining relationships between language test scores and variables in the assessment setting. It highlights the conceptual, methodological, and statistical problems associated with these techniques in dealing with multilevel or nested…
Teaching MBA Statistics Online: A Pedagogically Sound Process Approach
ERIC Educational Resources Information Center
Grandzol, John R.
2004-01-01
Delivering MBA statistics in the online environment presents significant challenges to education and students alike because of varying student preparedness levels, complexity of content, difficulty in assessing learning outcomes, and faculty availability and technological expertise. In this article, the author suggests a process model that…
Estimating individual benefits of medical or behavioral treatments in severely ill patients.
Diaz, Francisco J
2017-01-01
There is a need for statistical methods appropriate for the analysis of clinical trials from a personalized-medicine viewpoint as opposed to the common statistical practice that simply examines average treatment effects. This article proposes an approach to quantifying, reporting and analyzing individual benefits of medical or behavioral treatments to severely ill patients with chronic conditions, using data from clinical trials. The approach is a new development of a published framework for measuring the severity of a chronic disease and the benefits treatments provide to individuals, which utilizes regression models with random coefficients. Here, a patient is considered to be severely ill if the patient's basal severity is close to one. This allows the derivation of a very flexible family of probability distributions of individual benefits that depend on treatment duration and the covariates included in the regression model. Our approach may enrich the statistical analysis of clinical trials of severely ill patients because it allows investigating the probability distribution of individual benefits in the patient population and the variables that influence it, and we can also measure the benefits achieved in specific patients including new patients. We illustrate our approach using data from a clinical trial of the anti-depressant imipramine.
NASA Astrophysics Data System (ADS)
Riccio, A.; Giunta, G.; Galmarini, S.
2007-04-01
In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.
NASA Astrophysics Data System (ADS)
Riccio, A.; Giunta, G.; Galmarini, S.
2007-12-01
In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.
Identifiability of PBPK Models with Applications to ...
Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Frank R., III Thompson
2009-01-01
Habitat models are widely used in bird conservation planning to assess current habitat or populations and to evaluate management alternatives. These models include species-habitat matrix or database models, habitat suitability models, and statistical models that predict abundance. While extremely useful, these approaches have some limitations.
Network Polymers Formed Under Nonideal Conditions.
1986-12-01
the system or the limited ability of the statistical model to account for stochastic correlations. The viscosity of the reacting system was measured as...based on competing reactions (ring, chain) and employs equilibrium chain statistics . The work thus far has been limited to single cycle growth on an...polymerizations, because a large number of differential equations must be solved. The Makovian approach (sometimes referred to as the statistical or
Teaching "Instant Experience" with Graphical Model Validation Techniques
ERIC Educational Resources Information Center
Ekstrøm, Claus Thorn
2014-01-01
Graphical model validation techniques for linear normal models are often used to check the assumptions underlying a statistical model. We describe an approach to provide "instant experience" in looking at a graphical model validation plot, so it becomes easier to validate if any of the underlying assumptions are violated.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Multi-site precipitation downscaling using a stochastic weather generator
NASA Astrophysics Data System (ADS)
Chen, Jie; Chen, Hua; Guo, Shenglian
2018-03-01
Statistical downscaling is an efficient way to solve the spatiotemporal mismatch between climate model outputs and the data requirements of hydrological models. However, the most commonly-used downscaling method only produces climate change scenarios for a specific site or watershed average, which is unable to drive distributed hydrological models to study the spatial variability of climate change impacts. By coupling a single-site downscaling method and a multi-site weather generator, this study proposes a multi-site downscaling approach for hydrological climate change impact studies. Multi-site downscaling is done in two stages. The first stage involves spatially downscaling climate model-simulated monthly precipitation from grid scale to a specific site using a quantile mapping method, and the second stage involves the temporal disaggregating of monthly precipitation to daily values by adjusting the parameters of a multi-site weather generator. The inter-station correlation is specifically considered using a distribution-free approach along with an iterative algorithm. The performance of the downscaling approach is illustrated using a 10-station watershed as an example. The precipitation time series derived from the National Centers for Environment Prediction (NCEP) reanalysis dataset is used as the climate model simulation. The precipitation time series of each station is divided into 30 odd years for calibration and 29 even years for validation. Several metrics, including the frequencies of wet and dry spells and statistics of the daily, monthly and annual precipitation are used as criteria to evaluate the multi-site downscaling approach. The results show that the frequencies of wet and dry spells are well reproduced for all stations. In addition, the multi-site downscaling approach performs well with respect to reproducing precipitation statistics, especially at monthly and annual timescales. The remaining biases mainly result from the non-stationarity of NCEP precipitation. Overall, the proposed approach is efficient for generating multi-site climate change scenarios that can be used to investigate the spatial variability of climate change impacts on hydrology.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
NASA Astrophysics Data System (ADS)
Canli, Ekrem; Thiebes, Benni; Petschko, Helene; Glade, Thomas
2015-04-01
By now there is a broad consensus that due to human-induced global change the frequency and magnitude of heavy precipitation events is expected to increase in certain parts of the world. Given the fact, that rainfall serves as the most common triggering agent for landslide initiation, also an increased landside activity can be expected there. Landslide occurrence is a globally spread phenomenon that clearly needs to be handled. The present and well known problems in modelling landslide susceptibility and hazard give uncertain results in the prediction. This includes the lack of a universal applicable modelling solution for adequately assessing landslide susceptibility (which can be seen as the relative indication of the spatial probability of landslide initiation). Generally speaking, there are three major approaches for performing landslide susceptibility analysis: heuristic, statistical and deterministic models, all with different assumptions, its distinctive data requirements and differently interpretable outcomes. Still, detailed comparison of resulting landslide susceptibility maps are rare. In this presentation, the susceptibility modelling outputs of a deterministic model (Stability INdex MAPping - SINMAP) and a statistical modelling approach (generalized additive model - GAM) are compared. SINMAP is an infinite slope stability model which requires parameterization of soil mechanical parameters. Modelling with the generalized additive model, which represents a non-linear extension of a generalized linear model, requires a high quality landslide inventory that serves as the dependent variable in the statistical approach. Both methods rely on topographical data derived from the DTM. The comparison has been carried out in a study area located in the district of Waidhofen/Ybbs in Lower Austria. For the whole district (ca. 132 km²), 1063 landslides have been mapped and partially used within the analysis and the validation of the model outputs. The respective susceptibility maps have been reclassified to contain three susceptibility classes each. The comparison of the susceptibility maps was performed on a grid cell basis. A match of the maps was observed for grid cells located in the same susceptibility class. In contrast, a mismatch or deviation was observed for locations with different assigned susceptibility classes (up to two classes' difference). Although the modelling approaches differ significantly, more than 70% of the pixels reveal a match in the same susceptibility class. A mismatch by two classes' difference occurred in less than 2% of all pixels. Although the result looks promising and strengthens the confidence in the susceptibility zonation for this area, some of the general drawbacks related to the respective approaches still have to be addressed in further detail. Future work is heading towards an integration of probabilistic aspects into deterministic modelling.
Microscopic saw mark analysis: an empirical approach.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2015-01-01
Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
Statistical models of lunar rocks and regolith
NASA Technical Reports Server (NTRS)
Marcus, A. H.
1973-01-01
The mathematical, statistical, and computational approaches used in the investigation of the interrelationship of lunar fragmental material, regolith, lunar rocks, and lunar craters are described. The first two phases of the work explored the sensitivity of the production model of fragmental material to mathematical assumptions, and then completed earlier studies on the survival of lunar surface rocks with respect to competing processes. The third phase combined earlier work into a detailed statistical analysis and probabilistic model of regolith formation by lithologically distinct layers, interpreted as modified crater ejecta blankets. The fourth phase of the work dealt with problems encountered in combining the results of the entire project into a comprehensive, multipurpose computer simulation model for the craters and regolith. Highlights of each phase of research are given.
NASA Astrophysics Data System (ADS)
Kumar, Rakesh; Li, Zheng; Levin, Deborah A.
2011-05-01
In this work, we propose a new heat accommodation model to simulate freely expanding homogeneous condensation flows of gaseous carbon dioxide using a new approach, the statistical Bhatnagar-Gross-Krook method. The motivation for the present work comes from the earlier work of Li et al. [J. Phys. Chem. 114, 5276 (2010)] in which condensation models were proposed and used in the direct simulation Monte Carlo method to simulate the flow of carbon dioxide from supersonic expansions of small nozzles into near-vacuum conditions. Simulations conducted for stagnation pressures of one and three bar were compared with the measurements of gas and cluster number densities, cluster size, and carbon dioxide rotational temperature obtained by Ramos et al. [Phys. Rev. A 72, 3204 (2005)]. Due to the high computational cost of direct simulation Monte Carlo method, comparison between simulations and data could only be performed for these stagnation pressures, with good agreement obtained beyond the condensation onset point, in the farfield. As the stagnation pressure increases, the degree of condensation also increases; therefore, to improve the modeling of condensation onset, one must be able to simulate higher stagnation pressures. In simulations of an expanding flow of argon through a nozzle, Kumar et al. [AIAA J. 48, 1531 (2010)] found that the statistical Bhatnagar-Gross-Krook method provides the same accuracy as direct simulation Monte Carlo method, but, at one half of the computational cost. In this work, the statistical Bhatnagar-Gross-Krook method was modified to account for internal degrees of freedom for multi-species polyatomic gases. With the computational approach in hand, we developed and tested a new heat accommodation model for a polyatomic system to properly account for the heat release of condensation. We then developed condensation models in the framework of the statistical Bhatnagar-Gross-Krook method. Simulations were found to agree well with the experiment for all stagnation pressure cases (1-5 bar), validating the accuracy of the Bhatnagar-Gross-Krook based condensation model in capturing the physics of condensation.
Infinitely divisible cascades to model the statistics of natural images.
Chainais, Pierre
2007-12-01
We propose to model the statistics of natural images thanks to the large class of stochastic processes called Infinitely Divisible Cascades (IDC). IDC were first introduced in one dimension to provide multifractal time series to model the so-called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar infinitely divisible cascades from 1 to N dimensions and commented on the relevance of such a model in fully developed turbulence in [1]. In this article, we focus on the particular 2 dimensional case. IDC appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDC for applications to procedural texture synthesis.
John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping
2018-06-01
Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for designs failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflights systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for design, failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.
Wang, Ming; Long, Qi
2016-09-01
Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.
Hofman, Abe D.; Visser, Ingmar; Jansen, Brenda R. J.; van der Maas, Han L. J.
2015-01-01
We propose and test three statistical models for the analysis of children’s responses to the balance scale task, a seminal task to study proportional reasoning. We use a latent class modelling approach to formulate a rule-based latent class model (RB LCM) following from a rule-based perspective on proportional reasoning and a new statistical model, the Weighted Sum Model, following from an information-integration approach. Moreover, a hybrid LCM using item covariates is proposed, combining aspects of both a rule-based and information-integration perspective. These models are applied to two different datasets, a standard paper-and-pencil test dataset (N = 779), and a dataset collected within an online learning environment that included direct feedback, time-pressure, and a reward system (N = 808). For the paper-and-pencil dataset the RB LCM resulted in the best fit, whereas for the online dataset the hybrid LCM provided the best fit. The standard paper-and-pencil dataset yielded more evidence for distinct solution rules than the online data set in which quantitative item characteristics are more prominent in determining responses. These results shed new light on the discussion on sequential rule-based and information-integration perspectives of cognitive development. PMID:26505905
LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA
Salter-Townshend, Michael; McCormick, Tyler H.
2018-01-01
Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090–1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)]. PMID:29721127
LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA.
Salter-Townshend, Michael; McCormick, Tyler H
2017-09-01
Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090-1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)].
2017-01-01
Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274, 1926–1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105, 2745–2750; Thiessen & Yee 2010 Child Development 81, 1287–1303; Saffran 2002 Journal of Memory and Language 47, 172–196; Misyak & Christiansen 2012 Language Learning 62, 302–331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39, 246–263; Thiessen et al. 2013 Psychological Bulletin 139, 792–814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37, 310–343). This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences'. PMID:27872374
Thiessen, Erik D
2017-01-05
Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37: , 310-343).This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).
Modelling 1-minute directional observations of the global irradiance.
NASA Astrophysics Data System (ADS)
Thejll, Peter; Pagh Nielsen, Kristian; Andersen, Elsa; Furbo, Simon
2016-04-01
Direct and diffuse irradiances from the sky has been collected at 1-minute intervals for about a year from the experimental station at the Technical University of Denmark for the IEA project "Solar Resource Assessment and Forecasting". These data were gathered by pyrheliometers tracking the Sun, as well as with apertured pyranometers gathering 1/8th and 1/16th of the light from the sky in 45 degree azimuthal ranges pointed around the compass. The data are gathered in order to develop detailed models of the potentially available solar energy and its variations at high temporal resolution in order to gain a more detailed understanding of the solar resource. This is important for a better understanding of the sub-grid scale cloud variation that cannot be resolved with climate and weather models. It is also important for optimizing the operation of active solar energy systems such as photovoltaic plants and thermal solar collector arrays, and for passive solar energy and lighting to buildings. We present regression-based modelling of the observed data, and focus, here, on the statistical properties of the model fits. Using models based on the one hand on what is found in the literature and on physical expectations, and on the other hand on purely statistical models, we find solutions that can explain up to 90% of the variance in global radiation. The models leaning on physical insights include terms for the direct solar radiation, a term for the circum-solar radiation, a diffuse term and a term for the horizon brightening/darkening. The purely statistical model is found using data- and formula-validation approaches picking model expressions from a general catalogue of possible formulae. The method allows nesting of expressions, and the results found are dependent on and heavily constrained by the cross-validation carried out on statistically independent testing and training data-sets. Slightly better fits -- in terms of variance explained -- is found using the purely statistical fitting/searching approach. We describe the methods applied, results found, and discuss the different potentials of the physics- and statistics-only based model-searches.
How well can we predict forage species occurrence and abundance?
USDA-ARS?s Scientific Manuscript database
As part of a larger effort focused on forage species production and management, we have been developing a statistical modeling approach to predict the probability of species occurrence and the abundance for Orchard Grass over the Northeast region of the United States using two selected statistical m...
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis
ERIC Educational Resources Information Center
Gonzalez, Oscar; MacKinnon, David P.
2018-01-01
Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…
Three Strategies for the Critical Use of Statistical Methods in Psychological Research
ERIC Educational Resources Information Center
Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando
2017-01-01
We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…
Introduction to this special issue on statistics for wildfire processes
Marcia Gumpertz
2009-01-01
This special issue on statistics for wildfire processes brings together foresters, wildfire ecologists, statisticians, mathematicians, and economists. All of these disciplines bring different interests, approaches and expertise to the modeling of wildfire processes. It is not necessarily easy, however, to communicate across disciplines or follow the developments in a...
A Three-Step Approach To Model Tree Mortality in the State of Georgia
Qingmin Meng; Chris J. Cieszewski; Roger C. Lowe; Michal Zasada
2005-01-01
Tree mortality is one of the most complex phenomena of forest growth and yield. Many types of factors affect tree mortality, which is considered difficult to predict. This study presents a new systematic approach to simulate tree mortality based on the integration of statistical models and geographical information systems. This method begins with variable preselection...
Continuous-time discrete-space models for animal movement
Hanks, Ephraim M.; Hooten, Mevin B.; Alldredge, Mat W.
2015-01-01
The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.
A computational visual saliency model based on statistics and machine learning.
Lin, Ru-Je; Lin, Wei-Song
2014-08-01
Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.
Multiple point statistical simulation using uncertain (soft) conditional data
NASA Astrophysics Data System (ADS)
Hansen, Thomas Mejer; Vu, Le Thanh; Mosegaard, Klaus; Cordua, Knud Skou
2018-05-01
Geostatistical simulation methods have been used to quantify spatial variability of reservoir models since the 80s. In the last two decades, state of the art simulation methods have changed from being based on covariance-based 2-point statistics to multiple-point statistics (MPS), that allow simulation of more realistic Earth-structures. In addition, increasing amounts of geo-information (geophysical, geological, etc.) from multiple sources are being collected. This pose the problem of integration of these different sources of information, such that decisions related to reservoir models can be taken on an as informed base as possible. In principle, though difficult in practice, this can be achieved using computationally expensive Monte Carlo methods. Here we investigate the use of sequential simulation based MPS simulation methods conditional to uncertain (soft) data, as a computational efficient alternative. First, it is demonstrated that current implementations of sequential simulation based on MPS (e.g. SNESIM, ENESIM and Direct Sampling) do not account properly for uncertain conditional information, due to a combination of using only co-located information, and a random simulation path. Then, we suggest two approaches that better account for the available uncertain information. The first make use of a preferential simulation path, where more informed model parameters are visited preferentially to less informed ones. The second approach involves using non co-located uncertain information. For different types of available data, these approaches are demonstrated to produce simulation results similar to those obtained by the general Monte Carlo based approach. These methods allow MPS simulation to condition properly to uncertain (soft) data, and hence provides a computationally attractive approach for integration of information about a reservoir model.
Rough parameter dependence in climate models and the role of Ruelle-Pollicott resonances.
Chekroun, Mickaël David; Neelin, J David; Kondrashov, Dmitri; McWilliams, James C; Ghil, Michael
2014-02-04
Despite the importance of uncertainties encountered in climate model simulations, the fundamental mechanisms at the origin of sensitive behavior of long-term model statistics remain unclear. Variability of turbulent flows in the atmosphere and oceans exhibits recurrent large-scale patterns. These patterns, while evolving irregularly in time, manifest characteristic frequencies across a large range of time scales, from intraseasonal through interdecadal. Based on modern spectral theory of chaotic and dissipative dynamical systems, the associated low-frequency variability may be formulated in terms of Ruelle-Pollicott (RP) resonances. RP resonances encode information on the nonlinear dynamics of the system, and an approach for estimating them--as filtered through an observable of the system--is proposed. This approach relies on an appropriate Markov representation of the dynamics associated with a given observable. It is shown that, within this representation, the spectral gap--defined as the distance between the subdominant RP resonance and the unit circle--plays a major role in the roughness of parameter dependences. The model statistics are the most sensitive for the smallest spectral gaps; such small gaps turn out to correspond to regimes where the low-frequency variability is more pronounced, whereas autocorrelations decay more slowly. The present approach is applied to analyze the rough parameter dependence encountered in key statistics of an El-Niño-Southern Oscillation model of intermediate complexity. Theoretical arguments, however, strongly suggest that such links between model sensitivity and the decay of correlation properties are not limited to this particular model and could hold much more generally.
Rough parameter dependence in climate models and the role of Ruelle-Pollicott resonances
Chekroun, Mickaël David; Neelin, J. David; Kondrashov, Dmitri; McWilliams, James C.; Ghil, Michael
2014-01-01
Despite the importance of uncertainties encountered in climate model simulations, the fundamental mechanisms at the origin of sensitive behavior of long-term model statistics remain unclear. Variability of turbulent flows in the atmosphere and oceans exhibits recurrent large-scale patterns. These patterns, while evolving irregularly in time, manifest characteristic frequencies across a large range of time scales, from intraseasonal through interdecadal. Based on modern spectral theory of chaotic and dissipative dynamical systems, the associated low-frequency variability may be formulated in terms of Ruelle-Pollicott (RP) resonances. RP resonances encode information on the nonlinear dynamics of the system, and an approach for estimating them—as filtered through an observable of the system—is proposed. This approach relies on an appropriate Markov representation of the dynamics associated with a given observable. It is shown that, within this representation, the spectral gap—defined as the distance between the subdominant RP resonance and the unit circle—plays a major role in the roughness of parameter dependences. The model statistics are the most sensitive for the smallest spectral gaps; such small gaps turn out to correspond to regimes where the low-frequency variability is more pronounced, whereas autocorrelations decay more slowly. The present approach is applied to analyze the rough parameter dependence encountered in key statistics of an El-Niño–Southern Oscillation model of intermediate complexity. Theoretical arguments, however, strongly suggest that such links between model sensitivity and the decay of correlation properties are not limited to this particular model and could hold much more generally. PMID:24443553
Mediation Analysis with Survival Outcomes: Accelerated Failure Time vs. Proportional Hazards Models
Gelfand, Lois A.; MacKinnon, David P.; DeRubeis, Robert J.; Baraldi, Amanda N.
2016-01-01
Objective: Survival time is an important type of outcome variable in treatment research. Currently, limited guidance is available regarding performing mediation analyses with survival outcomes, which generally do not have normally distributed errors, and contain unobserved (censored) events. We present considerations for choosing an approach, using a comparison of semi-parametric proportional hazards (PH) and fully parametric accelerated failure time (AFT) approaches for illustration. Method: We compare PH and AFT models and procedures in their integration into mediation models and review their ability to produce coefficients that estimate causal effects. Using simulation studies modeling Weibull-distributed survival times, we compare statistical properties of mediation analyses incorporating PH and AFT approaches (employing SAS procedures PHREG and LIFEREG, respectively) under varied data conditions, some including censoring. A simulated data set illustrates the findings. Results: AFT models integrate more easily than PH models into mediation models. Furthermore, mediation analyses incorporating LIFEREG produce coefficients that can estimate causal effects, and demonstrate superior statistical properties. Censoring introduces bias in the coefficient estimate representing the treatment effect on outcome—underestimation in LIFEREG, and overestimation in PHREG. With LIFEREG, this bias can be addressed using an alternative estimate obtained from combining other coefficients, whereas this is not possible with PHREG. Conclusions: When Weibull assumptions are not violated, there are compelling advantages to using LIFEREG over PHREG for mediation analyses involving survival-time outcomes. Irrespective of the procedures used, the interpretation of coefficients, effects of censoring on coefficient estimates, and statistical properties should be taken into account when reporting results. PMID:27065906
Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri
2014-01-01
Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.
A new statistical methodology predicting chip failure probability considering electromigration
NASA Astrophysics Data System (ADS)
Sun, Ted
In this research thesis, we present a new approach to analyze chip reliability subject to electromigration (EM) whose fundamental causes and EM phenomenon happened in different materials are presented in this thesis. This new approach utilizes the statistical nature of EM failure in order to assess overall EM risk. It includes within-die temperature variations from the chip's temperature map extracted by an Electronic Design Automation (EDA) tool to estimate the failure probability of a design. Both the power estimation and thermal analysis are performed in the EDA flow. We first used the traditional EM approach to analyze the design with a single temperature across the entire chip that involves 6 metal and 5 via layers. Next, we used the same traditional approach but with a realistic temperature map. The traditional EM analysis approach and that coupled with a temperature map and the comparison between the results of considering and not considering temperature map are presented in in this research. A comparison between these two results confirms that using a temperature map yields a less pessimistic estimation of the chip's EM risk. Finally, we employed the statistical methodology we developed considering a temperature map and different use-condition voltages and frequencies to estimate the overall failure probability of the chip. The statistical model established considers the scaling work with the usage of traditional Black equation and four major conditions. The statistical result comparisons are within our expectations. The results of this statistical analysis confirm that the chip level failure probability is higher i) at higher use-condition frequencies for all use-condition voltages, and ii) when a single temperature instead of a temperature map across the chip is considered. In this thesis, I start with an overall review on current design types, common flows, and necessary verifications and reliability checking steps used in this IC design industry. Furthermore, the important concepts about "Scripting Automation" which is used in all the integration of using diversified EDA tools in this research work are also described in detail with several examples and my completed coding works are also put in the appendix for your reference. Hopefully, this construction of my thesis will give readers a thorough understanding about my research work from the automation of EDA tools to the statistical data generation, from the nature of EM to the statistical model construction, and the comparisons among the traditional EM analysis and the statistical EM analysis approaches.
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Statistically Based Approach to Broadband Liner Design and Assessment
NASA Technical Reports Server (NTRS)
Jones, Michael G. (Inventor); Nark, Douglas M. (Inventor)
2016-01-01
A broadband liner design optimization includes utilizing in-duct attenuation predictions with a statistical fan source model to obtain optimum impedance spectra over a number of flow conditions for one or more liner locations in a bypass duct. The predicted optimum impedance information is then used with acoustic liner modeling tools to design liners having impedance spectra that most closely match the predicted optimum values. Design selection is based on an acceptance criterion that provides the ability to apply increasing weighting to specific frequencies and/or operating conditions. One or more broadband design approaches are utilized to produce a broadband liner that targets a full range of frequencies and operating conditions.
a Statistical Dynamic Approach to Structural Evolution of Complex Capital Market Systems
NASA Astrophysics Data System (ADS)
Shao, Xiao; Chai, Li H.
As an important part of modern financial systems, capital market has played a crucial role on diverse social resource allocations and economical exchanges. Beyond traditional models and/or theories based on neoclassical economics, considering capital markets as typical complex open systems, this paper attempts to develop a new approach to overcome some shortcomings of the available researches. By defining the generalized entropy of capital market systems, a theoretical model and nonlinear dynamic equation on the operations of capital market are proposed from statistical dynamic perspectives. The US security market from 1995 to 2001 is then simulated and analyzed as a typical case. Some instructive results are discussed and summarized.
NASA Technical Reports Server (NTRS)
Karmali, M. S.; Phatak, A. V.
1982-01-01
Results of a study to investigate, by means of a computer simulation, the performance sensitivity of helicopter IMC DSAL operations as a function of navigation system parameters are presented. A mathematical model representing generically a navigation system is formulated. The scenario simulated consists of a straight in helicopter approach to landing along a 6 deg glideslope. The deceleration magnitude chosen is 03g. The navigation model parameters are varied and the statistics of the total system errors (TSE) computed. These statistics are used to determine the critical navigation system parameters that affect the performance of the closed-loop navigation, guidance and control system of a UH-1H helicopter.
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Models of dyadic social interaction.
Griffin, Dale; Gonzalez, Richard
2003-01-01
We discuss the logic of research designs for dyadic interaction and present statistical models with parameters that are tied to psychologically relevant constructs. Building on Karl Pearson's classic nineteenth-century statistical analysis of within-organism similarity, we describe several approaches to indexing dyadic interdependence and provide graphical methods for visualizing dyadic data. We also describe several statistical and conceptual solutions to the 'levels of analytic' problem in analysing dyadic data. These analytic strategies allow the researcher to examine and measure psychological questions of interdependence and social influence. We provide illustrative data from casually interacting and romantic dyads. PMID:12689382
Statistical tools for transgene copy number estimation based on real-time PCR.
Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal
2007-11-01
As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Santos, José António; Galante-Oliveira, Susana; Barroso, Carlos
2011-03-01
The current work presents an innovative statistical approach to model ordinal variables in environmental monitoring studies. An ordinal variable has values that can only be compared as "less", "equal" or "greater" and it is not possible to have information about the size of the difference between two particular values. The example of ordinal variable under this study is the vas deferens sequence (VDS) used in imposex (superimposition of male sexual characters onto prosobranch females) field assessment programmes for monitoring tributyltin (TBT) pollution. The statistical methodology presented here is the ordered logit regression model. It assumes that the VDS is an ordinal variable whose values match up a process of imposex development that can be considered continuous in both biological and statistical senses and can be described by a latent non-observable continuous variable. This model was applied to the case study of Nucella lapillus imposex monitoring surveys conducted in the Portuguese coast between 2003 and 2008 to evaluate the temporal evolution of TBT pollution in this country. In order to produce more reliable conclusions, the proposed model includes covariates that may influence the imposex response besides TBT (e.g. the shell size). The model also provides an analysis of the environmental risk associated to TBT pollution by estimating the probability of the occurrence of females with VDS ≥ 2 in each year, according to OSPAR criteria. We consider that the proposed application of this statistical methodology has a great potential in environmental monitoring whenever there is the need to model variables that can only be assessed through an ordinal scale of values.
Stotts, Steven A; Koch, Robert A
2017-08-01
In this paper an approach is presented to estimate the constraint required to apply maximum entropy (ME) for statistical inference with underwater acoustic data from a single track segment. Previous algorithms for estimating the ME constraint require multiple source track segments to determine the constraint. The approach is relevant for addressing model mismatch effects, i.e., inaccuracies in parameter values determined from inversions because the propagation model does not account for all acoustic processes that contribute to the measured data. One effect of model mismatch is that the lowest cost inversion solution may be well outside a relatively well-known parameter value's uncertainty interval (prior), e.g., source speed from track reconstruction or towed source levels. The approach requires, for some particular parameter value, the ME constraint to produce an inferred uncertainty interval that encompasses the prior. Motivating this approach is the hypothesis that the proposed constraint determination procedure would produce a posterior probability density that accounts for the effect of model mismatch on inferred values of other inversion parameters for which the priors might be quite broad. Applications to both measured and simulated data are presented for model mismatch that produces minimum cost solutions either inside or outside some priors.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Evaluation of Models of the Reading Process.
ERIC Educational Resources Information Center
Balajthy, Ernest
A variety of reading process models have been proposed and evaluated in reading research. Traditional approaches to model evaluation specify the workings of a system in a simplified fashion to enable organized, systematic study of the system's components. Following are several statistical methods of model evaluation: (1) empirical research on…
Specifying and Refining a Complex Measurement Model.
ERIC Educational Resources Information Center
Levy, Roy; Mislevy, Robert J.
This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-07-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Cano-Sancho, German; Labrune, Léa; Ploteau, Stéphane; Marchand, Philippe; Le Bizec, Bruno; Antignac, Jean-Philippe
2018-06-01
The gold-standard matrix for measuring the internal levels of persistent organic pollutants (POPs) is the adipose tissue, however in epidemiological studies the use of serum is preferred due to the low cost and higher accessibility. The interpretation of serum biomarkers is tightly related to the understanding of the underlying causal structure relating the POPs, serum lipids and the disease. Considering the extended benefits of using serum biomarkers we aimed to further examine if through statistical modelling we would be able to improve the use and interpretation of serum biomarkers in the study of endometriosis. Hence, we have conducted a systematic comparison of statistical approaches commonly used to lipid-adjust the circulating biomarkers of POPs based on existing methods, using data from a pilot case-control study focused on severe deep infiltrating endometriosis. The odds ratios (ORs) obtained from unconditional regression for those models with serum biomarkers were further compared to those obtained from adipose tissue. The results of this exploratory study did not support the use of blood biomarkers as proxy estimates of POPs in adipose tissue to implement in risk models for endometriosis with the available statistical approaches to correct for lipids. The current statistical approaches commonly used to lipid-adjust circulating POPs, do not fully represent the underlying biological complexity between POPs, lipids and disease (especially those directly or indirectly affecting or affected by lipid metabolism). Hence, further investigations are warranted to improve the use and interpretation of blood biomarkers under complex scenarios of lipid dynamics. Copyright © 2018 Elsevier Ltd. All rights reserved.
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity
Agami, Sarit
2017-01-01
Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
Probabilistic Graphical Model Representation in Phylogenetics
Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.
2014-01-01
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
A global reconstruction of climate-driven subdecadal water storage variability
NASA Astrophysics Data System (ADS)
Humphrey, V.; Gudmundsson, L.; Seneviratne, S. I.
2017-03-01
Since 2002, the Gravity Recovery and Climate Experiment (GRACE) mission has provided unprecedented observations of global mass redistribution caused by hydrological processes. However, there are still few sources on pre-2002 global terrestrial water storage (TWS). Classical approaches to retrieve past TWS rely on either land surface models (LSMs) or basin-scale water balance calculations. Here we propose a new approach which statistically relates anomalies in atmospheric drivers to monthly GRACE anomalies. Gridded subdecadal TWS changes and time-dependent uncertainty intervals are reconstructed for the period 1985-2015. Comparisons with model results demonstrate the performance and robustness of the derived data set, which represents a new and valuable source for studying subdecadal TWS variability, closing the ocean/land water budgets and assessing GRACE uncertainties. At midpoint between GRACE observations and LSM simulations, the statistical approach provides TWS estimates (doi:
Hannigan, Ailish; Bargary, Norma; Kinsella, Anthony; Clarke, Mary
2017-06-14
Although the relationships between duration of untreated psychosis (DUP) and outcomes are often assumed to be linear, few studies have explored the functional form of these relationships. The aim of this study is to demonstrate the potential of recent advances in curve fitting approaches (splines) to explore the form of the relationship between DUP and global assessment of functioning (GAF). Curve fitting approaches were used in models to predict change in GAF at long-term follow-up using DUP for a sample of 83 individuals with schizophrenia. The form of the relationship between DUP and GAF was non-linear. Accounting for non-linearity increased the percentage of variance in GAF explained by the model, resulting in better prediction and understanding of the relationship. The relationship between DUP and outcomes may be complex and model fit may be improved by accounting for the form of the relationship. This should be routinely assessed and new statistical approaches for non-linear relationships exploited, if appropriate. © 2017 John Wiley & Sons Australia, Ltd.
Moment-based metrics for global sensitivity analysis of hydrological systems
NASA Astrophysics Data System (ADS)
Dell'Oca, Aronne; Riva, Monica; Guadagnini, Alberto
2017-12-01
We propose new metrics to assist global sensitivity analysis, GSA, of hydrological and Earth systems. Our approach allows assessing the impact of uncertain parameters on main features of the probability density function, pdf, of a target model output, y. These include the expected value of y, the spread around the mean and the degree of symmetry and tailedness of the pdf of y. Since reliable assessment of higher-order statistical moments can be computationally demanding, we couple our GSA approach with a surrogate model, approximating the full model response at a reduced computational cost. Here, we consider the generalized polynomial chaos expansion (gPCE), other model reduction techniques being fully compatible with our theoretical framework. We demonstrate our approach through three test cases, including an analytical benchmark, a simplified scenario mimicking pumping in a coastal aquifer and a laboratory-scale conservative transport experiment. Our results allow ascertaining which parameters can impact some moments of the model output pdf while being uninfluential to others. We also investigate the error associated with the evaluation of our sensitivity metrics by replacing the original system model through a gPCE. Our results indicate that the construction of a surrogate model with increasing level of accuracy might be required depending on the statistical moment considered in the GSA. The approach is fully compatible with (and can assist the development of) analysis techniques employed in the context of reduction of model complexity, model calibration, design of experiment, uncertainty quantification and risk assessment.
A κ-generalized statistical mechanics approach to income analysis
NASA Astrophysics Data System (ADS)
Clementi, F.; Gallegati, M.; Kaniadakis, G.
2009-02-01
This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.
Compounding approach for univariate time series with nonstationary variances
NASA Astrophysics Data System (ADS)
Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich
2015-12-01
A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.
Compounding approach for univariate time series with nonstationary variances.
Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich
2015-12-01
A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.
Incorporating Yearly Derived Winter Wheat Maps Into Winter Wheat Yield Forecasting Model
NASA Technical Reports Server (NTRS)
Skakun, S.; Franch, B.; Roger, J.-C.; Vermote, E.; Becker-Reshef, I.; Justice, C.; Santamaría-Artigas, A.
2016-01-01
Wheat is one of the most important cereal crops in the world. Timely and accurate forecast of wheat yield and production at global scale is vital in implementing food security policy. Becker-Reshef et al. (2010) developed a generalized empirical model for forecasting winter wheat production using remote sensing data and official statistics. This model was implemented using static wheat maps. In this paper, we analyze the impact of incorporating yearly wheat masks into the forecasting model. We propose a new approach of producing in season winter wheat maps exploiting satellite data and official statistics on crop area only. Validation on independent data showed that the proposed approach reached 6% to 23% of omission error and 10% to 16% of commission error when mapping winter wheat 2-3 months before harvest. In general, we found a limited impact of using yearly winter wheat masks over a static mask for the study regions.
Hu, Jianhua; Wright, Fred A
2007-03-01
The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
NASA Technical Reports Server (NTRS)
Korram, S.
1977-01-01
The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.
Grossling, Bernardo F.
1975-01-01
Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then be viewed as phenomena subject to statistical uncertainty and responsive to changes in economic and technologic factors.
DOSE-RESPONSE ASSESSMENT FOR DEVELOPMENTAL TOXICITY III. STATISTICAL MODELS
Although quantitative modeling has been central to cancer risk assessment for years, the concept of do@e-response modeling for developmental effects is relatively new. he benchmark dose (BMD) approach has been proposed for use with developmental (as well as other noncancer) endpo...
Reaction rates for mesoscopic reaction-diffusion kinetics
Hellander, Stefan; Hellander, Andreas; Petzold, Linda
2015-02-23
The mesoscopic reaction-diffusion master equation (RDME) is a popular modeling framework frequently applied to stochastic reaction-diffusion kinetics in systems biology. The RDME is derived from assumptions about the underlying physical properties of the system, and it may produce unphysical results for models where those assumptions fail. In that case, other more comprehensive models are better suited, such as hard-sphere Brownian dynamics (BD). Although the RDME is a model in its own right, and not inferred from any specific microscale model, it proves useful to attempt to approximate a microscale model by a specific choice of mesoscopic reaction rates. In thismore » paper we derive mesoscopic scale-dependent reaction rates by matching certain statistics of the RDME solution to statistics of the solution of a widely used microscopic BD model: the Smoluchowski model with a Robin boundary condition at the reaction radius of two molecules. We also establish fundamental limits on the range of mesh resolutions for which this approach yields accurate results and show both theoretically and in numerical examples that as we approach the lower fundamental limit, the mesoscopic dynamics approach the microscopic dynamics. Finally, we show that for mesh sizes below the fundamental lower limit, results are less accurate. Thus, the lower limit determines the mesh size for which we obtain the most accurate results.« less
Reaction rates for mesoscopic reaction-diffusion kinetics
Hellander, Stefan; Hellander, Andreas; Petzold, Linda
2016-01-01
The mesoscopic reaction-diffusion master equation (RDME) is a popular modeling framework frequently applied to stochastic reaction-diffusion kinetics in systems biology. The RDME is derived from assumptions about the underlying physical properties of the system, and it may produce unphysical results for models where those assumptions fail. In that case, other more comprehensive models are better suited, such as hard-sphere Brownian dynamics (BD). Although the RDME is a model in its own right, and not inferred from any specific microscale model, it proves useful to attempt to approximate a microscale model by a specific choice of mesoscopic reaction rates. In this paper we derive mesoscopic scale-dependent reaction rates by matching certain statistics of the RDME solution to statistics of the solution of a widely used microscopic BD model: the Smoluchowski model with a Robin boundary condition at the reaction radius of two molecules. We also establish fundamental limits on the range of mesh resolutions for which this approach yields accurate results and show both theoretically and in numerical examples that as we approach the lower fundamental limit, the mesoscopic dynamics approach the microscopic dynamics. We show that for mesh sizes below the fundamental lower limit, results are less accurate. Thus, the lower limit determines the mesh size for which we obtain the most accurate results. PMID:25768640
Iguchi, Akira; Kumagai, Naoki H; Nakamura, Takashi; Suzuki, Atsushi; Sakai, Kazuhiko; Nojiri, Yukihiro
2014-12-15
In this study, we report the acidification impact mimicking the pre-industrial, the present, and near-future oceans on calcification of two coral species (Porites australiensis, Isopora palifera) by using precise pCO2 control system which can produce acidified seawater under stable pCO2 values with low variations. In the analyses, we performed Bayesian modeling approaches incorporating the variations of pCO2 and compared the results between our modeling approach and classical statistical one. The results showed highest calcification rates in pre-industrial pCO2 level and gradual decreases of calcification in the near-future ocean acidification level, which suggests that ongoing and near-future ocean acidification would negatively impact coral calcification. In addition, it was expected that the variations of parameters of carbon chemistry may affect the inference of the best model on calcification responses to these parameters between Bayesian modeling approach and classical statistical one even under stable pCO2 values with low variations. Copyright © 2014 Elsevier Ltd. All rights reserved.
Data Assimilation to Extract Soil Moisture Information From SMAP Observations
NASA Technical Reports Server (NTRS)
Kolassa, J.; Reichle, R. H.; Liu, Q.; Alemohammad, S. H.; Gentine, P.
2017-01-01
Statistical techniques permit the retrieval of soil moisture estimates in a model climatology while retaining the spatial and temporal signatures of the satellite observations. As a consequence, they can be used to reduce the need for localized bias correction techniques typically implemented in data assimilation (DA) systems that tend to remove some of the independent information provided by satellite observations. Here, we use a statistical neural network (NN) algorithm to retrieve SMAP (Soil Moisture Active Passive) surface soil moisture estimates in the climatology of the NASA Catchment land surface model. Assimilating these estimates without additional bias correction is found to significantly reduce the model error and increase the temporal correlation against SMAP CalVal in situ observations over the contiguous United States. A comparison with assimilation experiments using traditional bias correction techniques shows that the NN approach better retains the independent information provided by the SMAP observations and thus leads to larger model skill improvements during the assimilation. A comparison with the SMAP Level 4 product shows that the NN approach is able to provide comparable skill improvements and thus represents a viable assimilation approach.
Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models.
Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A; Tong, Xiaoren; Jadhav, Sneha; Lu, Qing
2016-04-01
Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
NASA Astrophysics Data System (ADS)
Collados-Lara, Antonio-Juan; Pulido-Velazquez, David; Pardo-Iguzquiza, Eulogio
2017-04-01
Assessing impacts of potential future climate change scenarios in precipitation and temperature is essential to design adaptive strategies in water resources systems. The objective of this work is to analyze the possibilities of different statistical downscaling methods to generate future potential scenarios in an Alpine Catchment from historical data and the available climate models simulations performed in the frame of the CORDEX EU project. The initial information employed to define these downscaling approaches are the historical climatic data (taken from the Spain02 project for the period 1971-2000 with a spatial resolution of 12.5 Km) and the future series provided by climatic models in the horizon period 2071-2100 . We have used information coming from nine climate model simulations (obtained from five different Regional climate models (RCM) nested to four different Global Climate Models (GCM)) from the European CORDEX project. In our application we have focused on the Representative Concentration Pathways (RCP) 8.5 emissions scenario, which is the most unfavorable scenario considered in the fifth Assessment Report (AR5) by the Intergovernmental Panel on Climate Change (IPCC). For each RCM we have generated future climate series for the period 2071-2100 by applying two different approaches, bias correction and delta change, and five different transformation techniques (first moment correction, first and second moment correction, regression functions, quantile mapping using distribution derived transformation and quantile mapping using empirical quantiles) for both of them. Ensembles of the obtained series were proposed to obtain more representative potential future climate scenarios to be employed to study potential impacts. In this work we propose a non-equifeaseble combination of the future series giving more weight to those coming from models (delta change approaches) or combination of models and techniques that provides better approximation to the basic and drought statistic of the historical data. A multi-objective analysis using basic statistics (mean, standard deviation and asymmetry coefficient) and droughts statistics (duration, magnitude and intensity) has been performed to identify which models are better in terms of goodness of fit to reproduce the historical series. The drought statistics have been obtained from the Standard Precipitation index (SPI) series using the Theory of Runs. This analysis allows discriminate the best RCM and the best combination of model and correction technique in the bias-correction method. We have also analyzed the possibilities of using different Stochastic Weather Generators to approximate the basic and droughts statistics of the historical series. These analyses have been performed in our case study in a lumped and in a distributed way in order to assess its sensibility to the spatial scale. The statistic of the future temperature series obtained with different ensemble options are quite homogeneous, but the precipitation shows a higher sensibility to the adopted method and spatial scale. The global increment in the mean temperature values are 31.79 %, 31.79 %, 31.03 % and 31.74 % for the distributed bias-correction, distributed delta-change, lumped bias-correction and lumped delta-change ensembles respectively and in the precipitation they are -25.48 %, -28.49 %, -26.42 % and -27.35% respectively. Acknowledgments: This research work has been partially supported by the GESINHIMPADAPT project (CGL2013-48424-C2-2-R) with Spanish MINECO funds. We would also like to thank Spain02 and CORDEX projects for the data provided for this study and the R package qmap.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Assessment of corneal properties based on statistical modeling of OCT speckle.
Jesus, Danilo A; Iskander, D Robert
2017-01-01
A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.
The use of algorithmic behavioural transfer functions in parametric EO system performance models
NASA Astrophysics Data System (ADS)
Hickman, Duncan L.; Smith, Moira I.
2015-10-01
The use of mathematical models to predict the overall performance of an electro-optic (EO) system is well-established as a methodology and is used widely to support requirements definition, system design, and produce performance predictions. Traditionally these models have been based upon cascades of transfer functions based on established physical theory, such as the calculation of signal levels from radiometry equations, as well as the use of statistical models. However, the performance of an EO system is increasing being dominated by the on-board processing of the image data and this automated interpretation of image content is complex in nature and presents significant modelling challenges. Models and simulations of EO systems tend to either involve processing of image data as part of a performance simulation (image-flow) or else a series of mathematical functions that attempt to define the overall system characteristics (parametric). The former approach is generally more accurate but statistically and theoretically weak in terms of specific operational scenarios, and is also time consuming. The latter approach is generally faster but is unable to provide accurate predictions of a system's performance under operational conditions. An alternative and novel architecture is presented in this paper which combines the processing speed attributes of parametric models with the accuracy of image-flow representations in a statistically valid framework. An additional dimension needed to create an effective simulation is a robust software design whose architecture reflects the structure of the EO System and its interfaces. As such, the design of the simulator can be viewed as a software prototype of a new EO System or an abstraction of an existing design. This new approach has been used successfully to model a number of complex military systems and has been shown to combine improved performance estimation with speed of computation. Within the paper details of the approach and architecture are described in detail, and example results based on a practical application are then given which illustrate the performance benefits. Finally, conclusions are drawn and comments given regarding the benefits and uses of the new approach.
Combining Statistics and Physics to Improve Climate Downscaling
NASA Astrophysics Data System (ADS)
Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.
2017-12-01
Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.
Statistical study of air pollutant concentrations via generalized gamma distribution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marani, A.; Lavagnini, I.; Buttazzoni, C.
1986-11-01
This paper deals with modeling observed frequency distributions of air quality data measured in the area of Venice, Italy. The paper discusses the application of the generalized gamma distribution (ggd) which has not been commonly applied to air quality data notwithstanding the fact that it embodies most distribution models used for air quality analyses. The approach yields important simplifications for statistical analyses. A comparison among the ggd and other relevant models (standard gamma, Weibull, lognormal), carried out on daily sulfur dioxide concentrations in the area of Venice underlines the efficiency of ggd models in portraying experimental data.
Smooth extrapolation of unknown anatomy via statistical shape models
NASA Astrophysics Data System (ADS)
Grupp, R. B.; Chiang, H.; Otake, Y.; Murphy, R. J.; Gordon, C. R.; Armand, M.; Taylor, R. H.
2015-03-01
Several methods to perform extrapolation of unknown anatomy were evaluated. The primary application is to enhance surgical procedures that may use partial medical images or medical images of incomplete anatomy. Le Fort-based, face-jaw-teeth transplant is one such procedure. From CT data of 36 skulls and 21 mandibles separate Statistical Shape Models of the anatomical surfaces were created. Using the Statistical Shape Models, incomplete surfaces were projected to obtain complete surface estimates. The surface estimates exhibit non-zero error in regions where the true surface is known; it is desirable to keep the true surface and seamlessly merge the estimated unknown surface. Existing extrapolation techniques produce non-smooth transitions from the true surface to the estimated surface, resulting in additional error and a less aesthetically pleasing result. The three extrapolation techniques evaluated were: copying and pasting of the surface estimate (non-smooth baseline), a feathering between the patient surface and surface estimate, and an estimate generated via a Thin Plate Spline trained from displacements between the surface estimate and corresponding vertices of the known patient surface. Feathering and Thin Plate Spline approaches both yielded smooth transitions. However, feathering corrupted known vertex values. Leave-one-out analyses were conducted, with 5% to 50% of known anatomy removed from the left-out patient and estimated via the proposed approaches. The Thin Plate Spline approach yielded smaller errors than the other two approaches, with an average vertex error improvement of 1.46 mm and 1.38 mm for the skull and mandible respectively, over the baseline approach.
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
A Statistical Approach for the Concurrent Coupling of Molecular Dynamics and Finite Element Methods
NASA Technical Reports Server (NTRS)
Saether, E.; Yamakov, V.; Glaessgen, E.
2007-01-01
Molecular dynamics (MD) methods are opening new opportunities for simulating the fundamental processes of material behavior at the atomistic level. However, increasing the size of the MD domain quickly presents intractable computational demands. A robust approach to surmount this computational limitation has been to unite continuum modeling procedures such as the finite element method (FEM) with MD analyses thereby reducing the region of atomic scale refinement. The challenging problem is to seamlessly connect the two inherently different simulation techniques at their interface. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the typical boundary value problem used to define a coupled domain. The method uses statistical averaging of the atomistic MD domain to provide displacement interface boundary conditions to the surrounding continuum FEM region, which, in return, generates interface reaction forces applied as piecewise constant traction boundary conditions to the MD domain. The two systems are computationally disconnected and communicate only through a continuous update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM) as opposed to a direct coupling method where interface atoms and FEM nodes are individually related. The methodology is inherently applicable to three-dimensional domains, avoids discretization of the continuum model down to atomic scales, and permits arbitrary temperatures to be applied.
Incorporating GIS and remote sensing for census population disaggregation
NASA Astrophysics Data System (ADS)
Wu, Shuo-Sheng'derek'
Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.
The Use of Computer-Assisted Identification of ARIMA Time-Series.
ERIC Educational Resources Information Center
Brown, Roger L.
This study was conducted to determine the effects of using various levels of tutorial statistical software for the tentative identification of nonseasonal ARIMA models, a statistical technique proposed by Box and Jenkins for the interpretation of time-series data. The Box-Jenkins approach is an iterative process encompassing several stages of…
2003-07-01
4, Gnanadesikan , 1977). An entity whose measured features fall into one of the regions is classified accordingly. For the approaches we discuss here... Gnanadesikan , R. 1977. Methods for Statistical Data Analysis of Multivariate Observations. John Wiley & Sons, New York. Hassig, N. L., O’Brien, R. F
DOT National Transportation Integrated Search
1981-10-01
Two statistical procedures have been developed to estimate hourly or daily aircraft counts. These counts can then be transformed into estimates of instantaneous air counts. The first procedure estimates the stable (deterministic) mean level of hourly...
NASA Technical Reports Server (NTRS)
Currit, P. A.
1983-01-01
The Cleanroom software development methodology is designed to take the gamble out of product releases for both suppliers and receivers of the software. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (Mean Time To Failure) of the product at the time of its release. A statistical approach to software product testing using randomly selected samples of test cases is considered. A statistical model is defined for the certification process which uses the timing data recorded during test. A reasonableness argument for this model is provided that uses previously published data on software product execution. Also included is a derivation of the certification model estimators and a comparison of the proposed least squares technique with the more commonly used maximum likelihood estimators.
A theory of stationarity and asymptotic approach in dissipative systems
NASA Astrophysics Data System (ADS)
Rubel, Michael Thomas
2007-05-01
The approximate dynamics of many physical phenomena, including turbulence, can be represented by dissipative systems of ordinary differential equations. One often turns to numerical integration to solve them. There is an incompatibility, however, between the answers it can produce (i.e., specific solution trajectories) and the questions one might wish to ask (e.g., what behavior would be typical in the laboratory?) To determine its outcome, numerical integration requires more detailed initial conditions than a laboratory could normally provide. In place of initial conditions, experiments stipulate how tests should be carried out: only under statistically stationary conditions, for example, or only during asymptotic approach to a final state. Stipulations such as these, rather than initial conditions, are what determine outcomes in the laboratory.This theoretical study examines whether the points of view can be reconciled: What is the relationship between one's statistical stipulations for how an experiment should be carried out--stationarity or asymptotic approach--and the expected results? How might those results be determined without invoking initial conditions explicitly?To answer these questions, stationarity and asymptotic approach conditions are analyzed in detail. Each condition is treated as a statistical constraint on the system--a restriction on the probability density of states that might be occupied when measurements take place. For stationarity, this reasoning leads to a singular, invariant probability density which is already familiar from dynamical systems theory. For asymptotic approach, it leads to a new, more regular probability density field. A conjecture regarding what appears to be a limit relationship between the two densities is presented.By making use of the new probability densities, one can derive output statistics directly, avoiding the need to create or manipulate initial data, and thereby avoiding the conceptual incompatibility mentioned above. This approach also provides a clean way to derive reduced-order models, complete with local and global error estimates, as well as a way to compare existing reduced-order models objectively.The new approach is explored in the context of five separate test problems: a trivial one-dimensional linear system, a damped unforced linear oscillator in two dimensions, the isothermal Rayleigh-Plesset equation, Lorenz's equations, and the Stokes limit of Burgers' equation in one space dimension. In each case, various output statistics are deduced without recourse to initial conditions. Further, reduced-order models are constructed for asymptotic approach of the damped unforced linear oscillator, the isothermal Rayleigh-Plesset system, and Lorenz's equations, and for stationarity of Lorenz's equations.
Reverse engineering systems models of regulation: discovery, prediction and mechanisms.
Ashworth, Justin; Wurtmann, Elisabeth J; Baliga, Nitin S
2012-08-01
Biological systems can now be understood in comprehensive and quantitative detail using systems biology approaches. Putative genome-scale models can be built rapidly based upon biological inventories and strategic system-wide molecular measurements. Current models combine statistical associations, causative abstractions, and known molecular mechanisms to explain and predict quantitative and complex phenotypes. This top-down 'reverse engineering' approach generates useful organism-scale models despite noise and incompleteness in data and knowledge. Here we review and discuss the reverse engineering of biological systems using top-down data-driven approaches, in order to improve discovery, hypothesis generation, and the inference of biological properties. Copyright © 2011 Elsevier Ltd. All rights reserved.
Wang, Guoli; Ebrahimi, Nader
2014-01-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader
2015-04-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Bayesian modelling of lung function data from multiple-breath washout tests.
Mahar, Robert K; Carlin, John B; Ranganathan, Sarath; Ponsonby, Anne-Louise; Vuillermin, Peter; Vukcevic, Damjan
2018-05-30
Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index, the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables the lung clearance index to be estimated by using tests with different degrees of completeness, something not possible with the standard approach. Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision. Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions. Copyright © 2018 John Wiley & Sons, Ltd.
The Practicality of Statistical Physics Handout Based on KKNI and the Constructivist Approach
NASA Astrophysics Data System (ADS)
Sari, S. Y.; Afrizon, R.
2018-04-01
Statistical physics lecture shows that: 1) the performance of lecturers, social climate, students’ competence and soft skills needed at work are in enough category, 2) students feel difficulties in following the lectures of statistical physics because it is abstract, 3) 40.72% of students needs more understanding in the form of repetition, practice questions and structured tasks, and 4) the depth of statistical physics material needs to be improved gradually and structured. This indicates that learning materials in accordance of The Indonesian National Qualification Framework or Kerangka Kualifikasi Nasional Indonesia (KKNI) with the appropriate learning approach are needed to help lecturers and students in lectures. The author has designed statistical physics handouts which have very valid criteria (90.89%) according to expert judgment. In addition, the practical level of handouts designed also needs to be considered in order to be easy to use, interesting and efficient in lectures. The purpose of this research is to know the practical level of statistical physics handout based on KKNI and a constructivist approach. This research is a part of research and development with 4-D model developed by Thiagarajan. This research activity has reached part of development test at Development stage. Data collection took place by using a questionnaire distributed to lecturers and students. Data analysis using descriptive data analysis techniques in the form of percentage. The analysis of the questionnaire shows that the handout of statistical physics has very practical criteria. The conclusion of this study is statistical physics handouts based on the KKNI and constructivist approach have been practically used in lectures.
NASA Astrophysics Data System (ADS)
Ars, Sébastien; Broquet, Grégoire; Yver Kwok, Camille; Roustan, Yelva; Wu, Lin; Arzoumanian, Emmanuel; Bousquet, Philippe
2017-12-01
This study presents a new concept for estimating the pollutant emission rates of a site and its main facilities using a series of atmospheric measurements across the pollutant plumes. This concept combines the tracer release method, local-scale atmospheric transport modelling and a statistical atmospheric inversion approach. The conversion between the controlled emission and the measured atmospheric concentrations of the released tracer across the plume places valuable constraints on the atmospheric transport. This is used to optimise the configuration of the transport model parameters and the model uncertainty statistics in the inversion system. The emission rates of all sources are then inverted to optimise the match between the concentrations simulated with the transport model and the pollutants' measured atmospheric concentrations, accounting for the transport model uncertainty. In principle, by using atmospheric transport modelling, this concept does not strongly rely on the good colocation between the tracer and pollutant sources and can be used to monitor multiple sources within a single site, unlike the classical tracer release technique. The statistical inversion framework and the use of the tracer data for the configuration of the transport and inversion modelling systems should ensure that the transport modelling errors are correctly handled in the source estimation. The potential of this new concept is evaluated with a relatively simple practical implementation based on a Gaussian plume model and a series of inversions of controlled methane point sources using acetylene as a tracer gas. The experimental conditions are chosen so that they are suitable for the use of a Gaussian plume model to simulate the atmospheric transport. In these experiments, different configurations of methane and acetylene point source locations are tested to assess the efficiency of the method in comparison to the classic tracer release technique in coping with the distances between the different methane and acetylene sources. The results from these controlled experiments demonstrate that, when the targeted and tracer gases are not well collocated, this new approach provides a better estimate of the emission rates than the tracer release technique. As an example, the relative error between the estimated and actual emission rates is reduced from 32 % with the tracer release technique to 16 % with the combined approach in the case of a tracer located 60 m upwind of a single methane source. Further studies and more complex implementations with more advanced transport models and more advanced optimisations of their configuration will be required to generalise the applicability of the approach and strengthen its robustness.
Noninformative prior in the quantum statistical model of pure states
NASA Astrophysics Data System (ADS)
Tanaka, Fuyuhiko
2012-06-01
In the present paper, we consider a suitable definition of a noninformative prior on the quantum statistical model of pure states. While the full pure-states model is invariant under unitary rotation and admits the Haar measure, restricted models, which we often see in quantum channel estimation and quantum process tomography, have less symmetry and no compelling rationale for any choice. We adopt a game-theoretic approach that is applicable to classical Bayesian statistics and yields a noninformative prior for a general class of probability distributions. We define the quantum detection game and show that there exist noninformative priors for a general class of a pure-states model. Theoretically, it gives one of the ways that we represent ignorance on the given quantum system with partial information. Practically, our method proposes a default distribution on the model in order to use the Bayesian technique in the quantum-state tomography with a small sample.
The log-periodic-AR(1)-GARCH(1,1) model for financial crashes
NASA Astrophysics Data System (ADS)
Gazola, L.; Fernandes, C.; Pizzinga, A.; Riera, R.
2008-02-01
This paper intends to meet recent claims for the attainment of more rigorous statistical methodology within the econophysics literature. To this end, we consider an econometric approach to investigate the outcomes of the log-periodic model of price movements, which has been largely used to forecast financial crashes. In order to accomplish reliable statistical inference for unknown parameters, we incorporate an autoregressive dynamic and a conditional heteroskedasticity structure in the error term of the original model, yielding the log-periodic-AR(1)-GARCH(1,1) model. Both the original and the extended models are fitted to financial indices of U. S. market, namely S&P500 and NASDAQ. Our analysis reveal two main points: (i) the log-periodic-AR(1)-GARCH(1,1) model has residuals with better statistical properties and (ii) the estimation of the parameter concerning the time of the financial crash has been improved.
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Mittal, Manish; Harrison, Donald L; Thompson, David M; Miller, Michael J; Farmer, Kevin C; Ng, Yu-Tze
2016-01-01
While the choice of analytical approach affects study results and their interpretation, there is no consensus to guide the choice of statistical approaches to evaluate public health policy change. This study compared and contrasted three statistical estimation procedures in the assessment of a U.S. Food and Drug Administration (FDA) suicidality warning, communicated in January 2008 and implemented in May 2009, on antiepileptic drug (AED) prescription claims. Longitudinal designs were utilized to evaluate Oklahoma (U.S. State) Medicaid claim data from January 2006 through December 2009. The study included 9289 continuously eligible individuals with prevalent diagnoses of epilepsy and/or psychiatric disorder. Segmented regression models using three estimation procedures [i.e., generalized linear models (GLM), generalized estimation equations (GEE), and generalized linear mixed models (GLMM)] were used to estimate trends of AED prescription claims across three time periods: before (January 2006-January 2008); during (February 2008-May 2009); and after (June 2009-December 2009) the FDA warning. All three statistical procedures estimated an increasing trend (P < 0.0001) in AED prescription claims before the FDA warning period. No procedures detected a significant change in trend during (GLM: -30.0%, 99% CI: -60.0% to 10.0%; GEE: -20.0%, 99% CI: -70.0% to 30.0%; GLMM: -23.5%, 99% CI: -58.8% to 1.2%) and after (GLM: 50.0%, 99% CI: -70.0% to 160.0%; GEE: 80.0%, 99% CI: -20.0% to 200.0%; GLMM: 47.1%, 99% CI: -41.2% to 135.3%) the FDA warning when compared to pre-warning period. Although the three procedures provided consistent inferences, the GEE and GLMM approaches accounted appropriately for correlation. Further, marginal models estimated using GEE produced more robust and valid population-level estimations. Copyright © 2016 Elsevier Inc. All rights reserved.
Adaptation of a Fast Optimal Interpolation Algorithm to the Mapping of Oceangraphic Data
NASA Technical Reports Server (NTRS)
Menemenlis, Dimitris; Fieguth, Paul; Wunsch, Carl; Willsky, Alan
1997-01-01
A fast, recently developed, multiscale optimal interpolation algorithm has been adapted to the mapping of hydrographic and other oceanographic data. This algorithm produces solution and error estimates which are consistent with those obtained from exact least squares methods, but at a small fraction of the computational cost. Problems whose solution would be completely impractical using exact least squares, that is, problems with tens or hundreds of thousands of measurements and estimation grid points, can easily be solved on a small workstation using the multiscale algorithm. In contrast to methods previously proposed for solving large least squares problems, our approach provides estimation error statistics while permitting long-range correlations, using all measurements, and permitting arbitrary measurement locations. The multiscale algorithm itself, published elsewhere, is not the focus of this paper. However, the algorithm requires statistical models having a very particular multiscale structure; it is the development of a class of multiscale statistical models, appropriate for oceanographic mapping problems, with which we concern ourselves in this paper. The approach is illustrated by mapping temperature in the northeastern Pacific. The number of hydrographic stations is kept deliberately small to show that multiscale and exact least squares results are comparable. A portion of the data were not used in the analysis; these data serve to test the multiscale estimates. A major advantage of the present approach is the ability to repeat the estimation procedure a large number of times for sensitivity studies, parameter estimation, and model testing. We have made available by anonymous Ftp a set of MATLAB-callable routines which implement the multiscale algorithm and the statistical models developed in this paper.
Simplified estimation of age-specific reference intervals for skewed data.
Wright, E M; Royston, P
1997-12-30
Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.
Morris, Jeffrey S
2012-01-01
In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational aspects of comparative proteomic studies, and summarizes contributions I along with numerous collaborators have made. First, there is an overview of comparative proteomics technologies, followed by a discussion of important experimental design and preprocessing issues that must be considered before statistical analysis can be done. Next, the two key approaches to analyzing proteomics data, feature extraction and functional modeling, are described. Feature extraction involves detection and quantification of discrete features like peaks or spots that theoretically correspond to different proteins in the sample. After an overview of the feature extraction approach, specific methods for mass spectrometry ( Cromwell ) and 2D gel electrophoresis ( Pinnacle ) are described. The functional modeling approach involves modeling the proteomic data in their entirety as functions or images. A general discussion of the approach is followed by the presentation of a specific method that can be applied, wavelet-based functional mixed models, and its extensions. All methods are illustrated by application to two example proteomic data sets, one from mass spectrometry and one from 2D gel electrophoresis. While the specific methods presented are applied to two specific proteomic technologies, MALDI-TOF and 2D gel electrophoresis, these methods and the other principles discussed in the paper apply much more broadly to other expression proteomics technologies.
Statistical estimation of femur micro-architecture using optimal shape and density predictors.
Lekadir, Karim; Hazrati-Marangalou, Javad; Hoogendoorn, Corné; Taylor, Zeike; van Rietbergen, Bert; Frangi, Alejandro F
2015-02-26
The personalization of trabecular micro-architecture has been recently shown to be important in patient-specific biomechanical models of the femur. However, high-resolution in vivo imaging of bone micro-architecture using existing modalities is still infeasible in practice due to the associated acquisition times, costs, and X-ray radiation exposure. In this study, we describe a statistical approach for the prediction of the femur micro-architecture based on the more easily extracted subject-specific bone shape and mineral density information. To this end, a training sample of ex vivo micro-CT images is used to learn the existing statistical relationships within the low and high resolution image data. More specifically, optimal bone shape and mineral density features are selected based on their predictive power and used within a partial least square regression model to estimate the unknown trabecular micro-architecture within the anatomical models of new subjects. The experimental results demonstrate the accuracy of the proposed approach, with average errors of 0.07 for both the degree of anisotropy and tensor norms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Statistical Approaches for Spatiotemporal Prediction of Low Flows
NASA Astrophysics Data System (ADS)
Fangmann, A.; Haberlandt, U.
2017-12-01
An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.
Static and Dynamic Model Update of an Inflatable/Rigidizable Torus Structure
NASA Technical Reports Server (NTRS)
Horta, Lucas G.; Reaves, mercedes C.
2006-01-01
The present work addresses the development of an experimental and computational procedure for validating finite element models. A torus structure, part of an inflatable/rigidizable Hexapod, is used to demonstrate the approach. Because of fabrication, materials, and geometric uncertainties, a statistical approach combined with optimization is used to modify key model parameters. Static test results are used to update stiffness parameters and dynamic test results are used to update the mass distribution. Updated parameters are computed using gradient and non-gradient based optimization algorithms. Results show significant improvements in model predictions after parameters are updated. Lessons learned in the areas of test procedures, modeling approaches, and uncertainties quantification are presented.
Routine Discovery of Complex Genetic Models using Genetic Algorithms
Moore, Jason H.; Hahn, Lance W.; Ritchie, Marylyn D.; Thornton, Tricia A.; White, Bill C.
2010-01-01
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes. PMID:20948983
Chiang, Austin W T; Liu, Wei-Chung; Charusanti, Pep; Hwang, Ming-Jing
2014-01-15
A major challenge in mathematical modeling of biological systems is to determine how model parameters contribute to systems dynamics. As biological processes are often complex in nature, it is desirable to address this issue using a systematic approach. Here, we propose a simple methodology that first performs an enrichment test to find patterns in the values of globally profiled kinetic parameters with which a model can produce the required system dynamics; this is then followed by a statistical test to elucidate the association between individual parameters and different parts of the system's dynamics. We demonstrate our methodology on a prototype biological system of perfect adaptation dynamics, namely the chemotaxis model for Escherichia coli. Our results agreed well with those derived from experimental data and theoretical studies in the literature. Using this model system, we showed that there are motifs in kinetic parameters and that these motifs are governed by constraints of the specified system dynamics. A systematic approach based on enrichment statistical tests has been developed to elucidate the relationships between model parameters and the roles they play in affecting system dynamics of a prototype biological network. The proposed approach is generally applicable and therefore can find wide use in systems biology modeling research.
A Model Fit Statistic for Generalized Partial Credit Model
ERIC Educational Resources Information Center
Liang, Tie; Wells, Craig S.
2009-01-01
Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic…
Developing chemical criteria for wildlife: The benchmark dose versus NOAEL approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Linder, G.
1995-12-31
Wildlife may be exposed to a wide variety of chemicals in their environment, and various strategies for evaluating wildlife risk for these chemicals have been developed. One, a ``no-observable-adverse-effects-level`` or NOAEL-approach has increasingly been applied to develop chemical criteria for wildlife. In this approach, the NOAEL represents the highest experimental concentration at which there is no statistically significant change in some toxicity endpoint relative to a control. Another, the ``benchmark dose`` or BMD-approach relies on the lower confidence limit for a concentration that corresponds to a small, but statistically significant, change in effect over some reference condition. Rather than correspondingmore » to a single experimental concentration as does the NOAEL, the BMD-approach considers the full concentration response curve for derivation of the BMD. Here, using a variety of vertebrates and an assortment of chemicals (including carbofuran, paraquat, methylmercury, cadmium, zinc, and copper), the NOAEL-approach will be critically evaluated relative to the BMD approach. Statistical models used in the BMD approach suggest these methods are potentially available for eliminating safety factors in risk calculations. A reluctance to recommend this, however, stems from the uncertainty associated with the shape of concentration-response curves at low concentrations. Also, with existing data the derivation of BMDs has shortcomings when sample size is small (10 or fewer animals per treatment). The success of BMD models clearly depends upon the continued collection of wildlife data in the field and laboratory, the design of toxicity studies sufficient for BMD calculations, and complete reporting of these results in the literature. Overall, the BMD approach for developing chemical criteria for wildlife should be given further consideration, since it more fully evaluates concentration-response data.« less
Progress with modeling activity landscapes in drug discovery.
Vogt, Martin
2018-04-19
Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Sachindra, D. A.; Perera, B. J. C.
2016-01-01
This paper presents a novel approach to incorporate the non-stationarities characterised in the GCM outputs, into the Predictor-Predictand Relationships (PPRs) in statistical downscaling models. In this approach, a series of 42 PPRs based on multi-linear regression (MLR) technique were determined for each calendar month using a 20-year moving window moved at a 1-year time step on the predictor data obtained from the NCEP/NCAR reanalysis data archive and observations of precipitation at 3 stations located in Victoria, Australia, for the period 1950–2010. Then the relationships between the constants and coefficients in the PPRs and the statistics of reanalysis data of predictors were determined for the period 1950–2010, for each calendar month. Thereafter, using these relationships with the statistics of the past data of HadCM3 GCM pertaining to the predictors, new PPRs were derived for the periods 1950–69, 1970–89 and 1990–99 for each station. This process yielded a non-stationary downscaling model consisting of a PPR per calendar month for each of the above three periods for each station. The non-stationarities in the climate are characterised by the long-term changes in the statistics of the climate variables and above process enabled relating the non-stationarities in the climate to the PPRs. These new PPRs were then used with the past data of HadCM3, to reproduce the observed precipitation. It was found that the non-stationary MLR based downscaling model was able to produce more accurate simulations of observed precipitation more often than conventional stationary downscaling models developed with MLR and Genetic Programming (GP). PMID:27997609
Sachindra, D A; Perera, B J C
2016-01-01
This paper presents a novel approach to incorporate the non-stationarities characterised in the GCM outputs, into the Predictor-Predictand Relationships (PPRs) in statistical downscaling models. In this approach, a series of 42 PPRs based on multi-linear regression (MLR) technique were determined for each calendar month using a 20-year moving window moved at a 1-year time step on the predictor data obtained from the NCEP/NCAR reanalysis data archive and observations of precipitation at 3 stations located in Victoria, Australia, for the period 1950-2010. Then the relationships between the constants and coefficients in the PPRs and the statistics of reanalysis data of predictors were determined for the period 1950-2010, for each calendar month. Thereafter, using these relationships with the statistics of the past data of HadCM3 GCM pertaining to the predictors, new PPRs were derived for the periods 1950-69, 1970-89 and 1990-99 for each station. This process yielded a non-stationary downscaling model consisting of a PPR per calendar month for each of the above three periods for each station. The non-stationarities in the climate are characterised by the long-term changes in the statistics of the climate variables and above process enabled relating the non-stationarities in the climate to the PPRs. These new PPRs were then used with the past data of HadCM3, to reproduce the observed precipitation. It was found that the non-stationary MLR based downscaling model was able to produce more accurate simulations of observed precipitation more often than conventional stationary downscaling models developed with MLR and Genetic Programming (GP).
NASA Technical Reports Server (NTRS)
He, Yuning
2015-01-01
The behavior of complex aerospace systems is governed by numerous parameters. For safety analysis it is important to understand how the system behaves with respect to these parameter values. In particular, understanding the boundaries between safe and unsafe regions is of major importance. In this paper, we describe a hierarchical Bayesian statistical modeling approach for the online detection and characterization of such boundaries. Our method for classification with active learning uses a particle filter-based model and a boundary-aware metric for best performance. From a library of candidate shapes incorporated with domain expert knowledge, the location and parameters of the boundaries are estimated using advanced Bayesian modeling techniques. The results of our boundary analysis are then provided in a form understandable by the domain expert. We illustrate our approach using a simulation model of a NASA neuro-adaptive flight control system, as well as a system for the detection of separation violations in the terminal airspace.
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
DOT National Transportation Integrated Search
2009-10-01
Travel demand modeling, in recent years, has seen a paradigm shift with an emphasis on analyzing travel at the : individual level rather than using direct statistical projections of aggregate travel demand as in the trip-based : approach. Specificall...
Efficiency Analysis of Public Universities in Thailand
ERIC Educational Resources Information Center
Kantabutra, Saranya; Tang, John C. S.
2010-01-01
This paper examines the performance of Thai public universities in terms of efficiency, using a non-parametric approach called data envelopment analysis. Two efficiency models, the teaching efficiency model and the research efficiency model, are developed and the analysis is conducted at the faculty level. Further statistical analyses are also…
Wheat mill stream properties for discrete element method modeling
USDA-ARS?s Scientific Manuscript database
A discrete phase approach based on individual wheat kernel characteristics is needed to overcome the limitations of previous statistical models and accurately predict the milling behavior of wheat. As a first step to develop a discrete element method (DEM) model for the wheat milling process, this s...
Averaging Models: Parameters Estimation with the R-Average Procedure
ERIC Educational Resources Information Center
Vidotto, G.; Massidda, D.; Noventa, S.
2010-01-01
The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…
ERIC Educational Resources Information Center
Subrahmanyam, Annamdevula
2017-01-01
Purpose: This paper aims to identify and test four competing models with the interrelationships between students' perceived service quality, students' satisfaction, loyalty and motivation using structural equation modeling (SEM), and to select the best model using chi-square difference (??2) statistic test. Design/methodology/approach: The study…
Statistics of Dark Matter Halos from Gravitational Lensing.
Jain; Van Waerbeke L
2000-02-10
We present a new approach to measure the mass function of dark matter halos and to discriminate models with differing values of Omega through weak gravitational lensing. We measure the distribution of peaks from simulated lensing surveys and show that the lensing signal due to dark matter halos can be detected for a wide range of peak heights. Even when the signal-to-noise ratio is well below the limit for detection of individual halos, projected halo statistics can be constrained for halo masses spanning galactic to cluster halos. The use of peak statistics relies on an analytical model of the noise due to the intrinsic ellipticities of source galaxies. The noise model has been shown to accurately describe simulated data for a variety of input ellipticity distributions. We show that the measured peak distribution has distinct signatures of gravitational lensing, and its non-Gaussian shape can be used to distinguish models with different values of Omega. The use of peak statistics is complementary to the measurement of field statistics, such as the ellipticity correlation function, and is possibly not susceptible to the same systematic errors.
Incorporating principal component analysis into air quality model evaluation
The efficacy of standard air quality model evaluation techniques is becoming compromised as the simulation periods continue to lengthen in response to ever increasing computing capacity. Accordingly, the purpose of this paper is to demonstrate a statistical approach called Princi...
Effects of preprocessing Landsat MSS data on derived features
NASA Technical Reports Server (NTRS)
Parris, T. M.; Cicone, R. C.
1983-01-01
Important to the use of multitemporal Landsat MSS data for earth resources monitoring, such as agricultural inventories, is the ability to minimize the effects of varying atmospheric and satellite viewing conditions, while extracting physically meaningful features from the data. In general, the approaches to the preprocessing problem have been derived from either physical or statistical models. This paper compares three proposed algorithms; XSTAR haze correction, Color Normalization, and Multiple Acquisition Mean Level Adjustment. These techniques represent physical, statistical, and hybrid physical-statistical models, respectively. The comparisons are made in the context of three feature extraction techniques; the Tasseled Cap, the Cate Color Cube. and Normalized Difference.
NASA Astrophysics Data System (ADS)
Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard
2018-07-01
This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.
Implication of Tsallis entropy in the Thomas–Fermi model for self-gravitating fermions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ourabah, Kamel; Tribeche, Mouloud, E-mail: mouloudtribeche@yahoo.fr
The Thomas–Fermi approach for self-gravitating fermions is revisited within the theoretical framework of the q-statistics. Starting from the q-deformation of the Fermi–Dirac distribution function, a generalized Thomas–Fermi equation is derived. It is shown that the Tsallis entropy preserves a scaling property of this equation. The q-statistical approach to Jeans’ instability in a system of self-gravitating fermions is also addressed. The dependence of the Jeans’ wavenumber (or the Jeans length) on the parameter q is traced. It is found that the q-statistics makes the Fermionic system unstable at scales shorter than the standard Jeans length. -- Highlights: •Thomas–Fermi approach for self-gravitatingmore » fermions. •A generalized Thomas–Fermi equation is derived. •Nonextensivity preserves a scaling property of this equation. •Nonextensive approach to Jeans’ instability of self-gravitating fermions. •It is found that nonextensivity makes the Fermionic system unstable at shorter scales.« less
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Statistical Compression of Wind Speed Data
NASA Astrophysics Data System (ADS)
Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.
2017-12-01
In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.
NASA Astrophysics Data System (ADS)
Gädeke, Anne; Koch, Hagen; Pohle, Ina; Grünewald, Uwe
2014-05-01
In anthropogenically heavily impacted river catchments, such as the Lusatian river catchments of Spree and Schwarze Elster (Germany), the robust assessment of possible impacts of climate change on the regional water resources is of high relevance for the development and implementation of suitable climate change adaptation strategies. Large uncertainties inherent in future climate projections may, however, reduce the willingness of regional stakeholder to develop and implement suitable adaptation strategies to climate change. This study provides an overview of different possibilities to consider uncertainties in climate change impact assessments by means of (1) an ensemble based modelling approach and (2) the incorporation of measured and simulated meteorological trends. The ensemble based modelling approach consists of the meteorological output of four climate downscaling approaches (DAs) (two dynamical and two statistical DAs (113 realisations in total)), which drive different model configurations of two conceptually different hydrological models (HBV-light and WaSiM-ETH). As study area serve three near natural subcatchments of the Spree and Schwarze Elster river catchments. The objective of incorporating measured meteorological trends into the analysis was twofold: measured trends can (i) serve as a mean to validate the results of the DAs and (ii) be regarded as harbinger for the future direction of change. Moreover, regional stakeholders seem to have more trust in measurements than in modelling results. In order to evaluate the nature of the trends, both gradual (Mann-Kendall test) and step changes (Pettitt test) are considered as well as both temporal and spatial correlations in the data. The results of the ensemble based modelling chain show that depending on the type (dynamical or statistical) of DA used, opposing trends in precipitation, actual evapotranspiration and discharge are simulated in the scenario period (2031-2060). While the statistical DAs simulate a strong decrease in future long term annual precipitation, the dynamical DAs simulate a tendency towards increasing precipitation. The trend analysis suggests that precipitation has not changed significantly during the period 1961-2006. Therefore, the decrease simulated by the statistical DAs should be interpreted as a rather dry future projection. Concerning air temperature, measured and simulated trends agree on a positive trend. Also the uncertainty related to the hydrological model within the climate change modelling chain is comparably low when long-term averages are considered but increases significantly during extreme events. This proposed framework of combining an ensemble based modelling approach with measured trend analysis is a promising approach for regional stakeholders to gain more confidence into the final results of climate change impact assessments. However, climate change impact assessments will remain highly uncertain. Thus, flexible adaptation strategies need to be developed which should not only consider climate but also other aspects of global change.
Modeling of Pedestrian Flows Using Hybrid Models of Euler Equations and Dynamical Systems
NASA Astrophysics Data System (ADS)
Bärwolff, Günter; Slawig, Thomas; Schwandt, Hartmut
2007-09-01
In the last years various systems have been developed for controlling, planning and predicting the traffic of persons and vehicles, in particular under security aspects. Going beyond pure counting and statistical models, approaches were found to be very adequate and accurate which are based on well-known concepts originally developed in very different research areas, namely continuum mechanics and computer science. In the present paper, we outline a continuum mechanical approach for the description of pedestrain flow.
Johnson, Douglas H.; Cook, R.D.
2013-01-01
In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.
A Bayesian approach to modeling diffraction profiles and application to ferroelectric materials
Iamsasri, Thanakorn; Guerrier, Jonathon; Esteves, Giovanni; ...
2017-02-01
A new statistical approach for modeling diffraction profiles is introduced, using Bayesian inference and a Markov chain Monte Carlo (MCMC) algorithm. This method is demonstrated by modeling the degenerate reflections during application of an electric field to two different ferroelectric materials: thin-film lead zirconate titanate (PZT) of composition PbZr 0.3Ti 0.7O 3and a bulk commercial PZT polycrystalline ferroelectric. Here, the new method offers a unique uncertainty quantification of the model parameters that can be readily propagated into new calculated parameters.
Statistical label fusion with hierarchical performance models
Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.
2014-01-01
Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809
Terminology, concepts, and models in genetic epidemiology.
Teare, M Dawn; Koref, Mauro F Santibàñez
2011-01-01
Genetic epidemiology brings together approaches and techniques developed in mathematical genetics and statistics, medical genetics, quantitative genetics, and epidemiology. In the 1980s, the focus was on the mapping and identification of genes where defects had large effects at the individual level. More recently, statistical and experimental advances have made possible to identify and characterise genes associated with small effects at the individual level. In this chapter, we provide a brief outline of the models, concepts, and terminology used in genetic epidemiology.
Dorazio, Robert M; Hunter, Margaret E
2015-11-03
Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
NASA Technical Reports Server (NTRS)
Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard
2013-01-01
Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.
Aucouturier, Jean-Julien; Defreville, Boris; Pachet, François
2007-08-01
The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.
Comparison of simulation modeling and satellite techniques for monitoring ecological processes
NASA Technical Reports Server (NTRS)
Box, Elgene O.
1988-01-01
In 1985 improvements were made in the world climatic data base for modeling and predictive mapping; in individual process models and the overall carbon-balance models; and in the interface software for mapping the simulation results. Statistical analysis of the data base was begun. In 1986 mapping was shifted to NASA-Goddard. The initial approach involving pattern comparisons was modified to a more statistical approach. A major accomplishment was the expansion and improvement of a global data base of measurements of biomass and primary production, to complement the simulation data. The main accomplishments during 1987 included: production of a master tape with all environmental and satellite data and model results for the 1600 sites; development of a complete mapping system used for the initial color maps comparing annual and monthly patterns of Normalized Difference Vegetation Index (NDVI), actual evapotranspiration, net primary productivity, gross primary productivity, and net ecosystem production; collection of more biosphere measurements for eventual improvement of the biological models; and development of some initial monthly models for primary productivity, based on satellite data.
Effective model approach to the dense state of QCD matter
NASA Astrophysics Data System (ADS)
Fukushima, Kenji
2011-12-01
The first-principle approach to the dense state of QCD matter, i.e. the lattice-QCD simulation at finite baryon density, is not under theoretical control for the moment. The effective model study based on QCD symmetries is a practical alternative. However the model parameters that are fixed by hadronic properties in the vacuum may have unknown dependence on the baryon chemical potential. We propose a new prescription to constrain the effective model parameters by the matching condition with the thermal Statistical Model. In the transitional region where thermal quantities blow up in the Statistical Model, deconfined quarks and gluons should smoothly take over the relevant degrees of freedom from hadrons and resonances. We use the Polyakov-loop coupled Nambu-Jona-Lasinio (PNJL) model as an effective description in the quark side and show how the matching condition is satisfied by a simple ansäatz on the Polyakov loop potential. Our results favor a phase diagram with the chiral phase transition located at slightly higher temperature than deconfinement which stays close to the chemical freeze-out points.
NASA Astrophysics Data System (ADS)
Goldsworthy, M. J.
2012-10-01
One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.
Quantum description of light propagation in generalized media
NASA Astrophysics Data System (ADS)
Häyrynen, Teppo; Oksanen, Jani
2016-02-01
Linear quantum input-output relation based models are widely applied to describe the light propagation in a lossy medium. The details of the interaction and the associated added noise depend on whether the device is configured to operate as an amplifier or an attenuator. Using the traveling wave (TW) approach, we generalize the linear material model to simultaneously account for both the emission and absorption processes and to have point-wise defined noise field statistics and intensity dependent interaction strengths. Thus, our approach describes the quantum input-output relations of linear media with net attenuation, amplification or transparency without pre-selection of the operation point. The TW approach is then applied to investigate materials at thermal equilibrium, inverted materials, the transparency limit where losses are compensated, and the saturating amplifiers. We also apply the approach to investigate media in nonuniform states which can be e.g. consequences of a temperature gradient over the medium or a position dependent inversion of the amplifier. Furthermore, by using the generalized model we investigate devices with intensity dependent interactions and show how an initial thermal field transforms to a field having coherent statistics due to gain saturation.
A generalized estimating equations approach for resting-state functional MRI group analysis.
D'Angelo, Gina M; Lazar, Nicole A; Eddy, William F; Morris, John C; Sheline, Yvette I
2011-01-01
An Alzheimer's fMRI study has motivated us to evaluate inter-regional correlations between groups. The overall objective is to assess inter-regional correlations at a resting-state with no stimulus or task. We propose using a generalized estimating equation (GEE) transition model and a GEE marginal model to model the within-subject correlation for each region. Residuals calculated from the GEE models are used to correlate brain regions and assess between group differences. The standard pooling approach of group averages of the Fisher-z transformation assuming temporal independence is a typical approach used to compare group correlations. The GEE approaches and standard Fisher-z pooling approach are demonstrated with an Alzheimer's disease (AD) connectivity study in a population of AD subjects and healthy control subjects. We also compare these methods using simulation studies and show that the transition model may have better statistical properties.
Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A
2018-05-15
Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Shirzaei, M.; Walter, T. R.
2009-10-01
Modern geodetic techniques provide valuable and near real-time observations of volcanic activity. Characterizing the source of deformation based on these observations has become of major importance in related monitoring efforts. We investigate two random search approaches, simulated annealing (SA) and genetic algorithm (GA), and utilize them in an iterated manner. The iterated approach helps to prevent GA in general and SA in particular from getting trapped in local minima, and it also increases redundancy for exploring the search space. We apply a statistical competency test for estimating the confidence interval of the inversion source parameters, considering their internal interaction through the model, the effect of the model deficiency, and the observational error. Here, we present and test this new randomly iterated search and statistical competency (RISC) optimization method together with GA and SA for the modeling of data associated with volcanic deformations. Following synthetic and sensitivity tests, we apply the improved inversion techniques to two episodes of activity in the Campi Flegrei volcanic region in Italy, observed by the interferometric synthetic aperture radar technique. Inversion of these data allows derivation of deformation source parameters and their associated quality so that we can compare the two inversion methods. The RISC approach was found to be an efficient method in terms of computation time and search results and may be applied to other optimization problems in volcanic and tectonic environments.
NASA Technical Reports Server (NTRS)
Holms, A. G.
1977-01-01
A statistical decision procedure called chain pooling had been developed for model selection in fitting the results of a two-level fixed-effects full or fractional factorial experiment not having replication. The basic strategy included the use of one nominal level of significance for a preliminary test and a second nominal level of significance for the final test. The subject has been reexamined from the point of view of using as many as three successive statistical model deletion procedures in fitting the results of a single experiment. The investigation consisted of random number studies intended to simulate the results of a proposed aircraft turbine-engine rotor-burst-protection experiment. As a conservative approach, population model coefficients were chosen to represent a saturated 2 to the 4th power experiment with a distribution of parameter values unfavorable to the decision procedures. Three model selection strategies were developed.
Artificial neural network study on organ-targeting peptides
NASA Astrophysics Data System (ADS)
Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun
2010-01-01
We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.
A new concept in seismic landslide hazard analysis for practical application
NASA Astrophysics Data System (ADS)
Lee, Chyi-Tyi
2017-04-01
A seismic landslide hazard model could be constructed using deterministic approach (Jibson et al., 2000) or statistical approach (Lee, 2014). Both approaches got landslide spatial probability under a certain return-period earthquake. In the statistical approach, our recent study found that there are common patterns among different landslide susceptibility models of the same region. The common susceptibility could reflect relative stability of slopes at a region; higher susceptibility indicates lower stability. Using the common susceptibility together with an earthquake event landslide inventory and a map of topographically corrected Arias intensity, we can build the relationship among probability of failure, Arias intensity and the susceptibility. This relationship can immediately be used to construct a seismic landslide hazard map for the region that the empirical relationship built. If the common susceptibility model is further normalized and the empirical relationship built with normalized susceptibility, then the empirical relationship may be practically applied to different region with similar tectonic environments and climate conditions. This could be feasible, when a region has no existing earthquake-induce landslide data to train the susceptibility model and to build the relationship. It is worth mentioning that a rain-induced landslide susceptibility model has common pattern similar to earthquake-induced landslide susceptibility in the same region, and is usable to build the relationship with an earthquake event landslide inventory and a map of Arias intensity. These will be introduced with examples in the meeting.
Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.
Monica, Stefania; Ferrari, Gianluigi
2018-05-17
Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.
Statistical modeling of urban air temperature distributions under different synoptic conditions
NASA Astrophysics Data System (ADS)
Beck, Christoph; Breitner, Susanne; Cyrys, Josef; Hald, Cornelius; Hartz, Uwe; Jacobeit, Jucundus; Richter, Katja; Schneider, Alexandra; Wolf, Kathrin
2015-04-01
Within urban areas air temperature may vary distinctly between different locations. These intra-urban air temperature variations partly reach magnitudes that are relevant with respect to human thermal comfort. Therefore and furthermore taking into account potential interrelations with other health related environmental factors (e.g. air quality) it is important to estimate spatial patterns of intra-urban air temperature distributions that may be incorporated into urban planning processes. In this contribution we present an approach to estimate spatial temperature distributions in the urban area of Augsburg (Germany) by means of statistical modeling. At 36 locations in the urban area of Augsburg air temperatures are measured with high temporal resolution (4 min.) since December 2012. These 36 locations represent different typical urban land use characteristics in terms of varying percentage coverages of different land cover categories (e.g. impervious, built-up, vegetated). Percentage coverages of these land cover categories have been extracted from different sources (Open Street Map, European Urban Atlas, Urban Morphological Zones) for regular grids of varying size (50, 100, 200 meter horizonal resolution) for the urban area of Augsburg. It is well known from numerous studies that land use characteristics have a distinct influence on air temperature and as well other climatic variables at a certain location. Therefore air temperatures at the 36 locations are modeled utilizing land use characteristics (percentage coverages of land cover categories) as predictor variables in Stepwise Multiple Regression models and in Random Forest based model approaches. After model evaluation via cross-validation appropriate statistical models are applied to gridded land use data to derive spatial urban air temperature distributions. Varying models are tested and applied for different seasons and times of the day and also for different synoptic conditions (e.g. clear and calm situations, cloudy and windy situations). Based on hourly air temperature data from our measurements in the urban area of Augsburg distinct temperature differences between locations with different urban land use characteristics are revealed. Under clear and calm weather conditions differences between mean hourly air temperatures reach values around 8°C. Whereas during cloudy and windy weather maximum differences in mean hourly air temperatures do not exceed 5°C. Differences appear usually slightly more pronounced in summer than in winter. First results from the application of statistical modeling approaches reveal promising skill of the models in terms of explained variances reaching up to 60% in leave-one-out cross-validation experiments. The contribution depicts the methodology of our approach and presents and discusses first results.
NASA Technical Reports Server (NTRS)
Alexandrov, Mikhail Dmitrievic; Geogdzhayev, Igor V.; Tsigaridis, Konstantinos; Marshak, Alexander; Levy, Robert; Cairns, Brian
2016-01-01
A novel model for the variability in aerosol optical thickness (AOT) is presented. This model is based on the consideration of AOT fields as realizations of a stochastic process, that is the exponent of an underlying Gaussian process with a specific autocorrelation function. In this approach AOT fields have lognormal PDFs and structure functions having the correct asymptotic behavior at large scales. The latter is an advantage compared with fractal (scale-invariant) approaches. The simple analytical form of the structure function in the proposed model facilitates its use for the parameterization of AOT statistics derived from remote sensing data. The new approach is illustrated using a month-long global MODIS AOT dataset (over ocean) with 10 km resolution. It was used to compute AOT statistics for sample cells forming a grid with 5deg spacing. The observed shapes of the structure functions indicated that in a large number of cases the AOT variability is split into two regimes that exhibit different patterns of behavior: small-scale stationary processes and trends reflecting variations at larger scales. The small-scale patterns are suggested to be generated by local aerosols within the marine boundary layer, while the large-scale trends are indicative of elevated aerosols transported from remote continental sources. This assumption is evaluated by comparison of the geographical distributions of these patterns derived from MODIS data with those obtained from the GISS GCM. This study shows considerable potential to enhance comparisons between remote sensing datasets and climate models beyond regional mean AOTs.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis
NASA Astrophysics Data System (ADS)
Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.
2011-03-01
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.
2000-04-10
interest. These include Statistical Energy Analysis (SEA), fuzzy structure theory, and approaches combining modal analysis and SEA. Non-determinism...34 arising with increasing frequency. This has led to Statistical Energy Analysis , in which a system is modelled as a collection of coupled subsystems...22. IUTAM Symposium on Statistical Energy Analysis . 1999 Ed. F.J. Fahy and W.G. Price. Kluwer Academic Publishing. • 23. R.S. Langley and P
NASA Technical Reports Server (NTRS)
Cucinotta, Francis A.; Wilson, John W.
1996-01-01
The angular momentum independent statistical decay model is often applied using a Monte-Carlo simulation to describe the decay of prefragment nuclei in heavy ion reactions. This paper presents an analytical approach to the decay problem of nuclei with mass number less than 60, which is important for galactic cosmic ray (GCR) studies. This decay problem of nuclei with mass number less than 60 incorporates well-known levels of the lightest nuclei (A less than 11) to improve convergence and accuracy. A sensitivity study of the model level density function is used to determine the impact on mass and charge distributions in nuclear fragmentation. This angular momentum independent statistical decay model also describes the momentum and energy distribution of emitted particles (n, p, d, t, h, and a) from a prefragment nucleus.
New Insights into Handling Missing Values in Environmental Epidemiological Studies
Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal
2014-01-01
Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278
NASA Astrophysics Data System (ADS)
Garrett, T. J.; Alva, S.; Glenn, I. B.; Krueger, S. K.
2015-12-01
There are two possible approaches for parameterizing sub-grid cloud dynamics in a coarser grid model. The most common is to use a fine scale model to explicitly resolve the mechanistic details of clouds to the best extent possible, and then to parameterize these behaviors cloud state for the coarser grid. A second is to invoke physical intuition and some very general theoretical principles from equilibrium statistical mechanics. This approach avoids any requirement to resolve time-dependent processes in order to arrive at a suitable solution. The second approach is widely used elsewhere in the atmospheric sciences: for example the Planck function for blackbody radiation is derived this way, where no mention is made of the complexities of modeling a large ensemble of time-dependent radiation-dipole interactions in order to obtain the "grid-scale" spectrum of thermal emission by the blackbody as a whole. We find that this statistical approach may be equally suitable for modeling convective clouds. Specifically, we make the physical argument that the dissipation of buoyant energy in convective clouds is done through mixing across a cloud perimeter. From thermodynamic reasoning, one might then anticipate that vertically stacked isentropic surfaces are characterized by a power law dlnN/dlnP = -1, where N(P) is the number clouds of perimeter P. In a Giga-LES simulation of convective clouds within a 100 km square domain we find that such a power law does appear to characterize simulated cloud perimeters along isentropes, provided a sufficient cloudy sample. The suggestion is that it may be possible to parameterize certain important aspects of cloud state without appealing to computationally expensive dynamic simulations.
2015-06-23
T. Bates, S. Brocklebank, S. Pauls, and D.Rockmore, A spectral clustering approach to the structure of personality: contrasting the FFM and...A spectral clustering approach to the structure of personality: contrasting the FFM and HEXACO models, Journal of Research in Personality, Volume 57
Monte Carlo Approach for Reliability Estimations in Generalizability Studies.
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
A Monte Carlo approach is proposed, using the Statistical Analysis System (SAS) programming language, for estimating reliability coefficients in generalizability theory studies. Test scores are generated by a probabilistic model that considers the probability for a person with a given ability score to answer an item with a given difficulty…
1985-02-01
Energy Analysis , a branch of dynamic modal analysis developed for analyzing acoustic vibration problems, its present stage of development embodies a...Maximum Entropy Stochastic Modelling and Reduced-Order Design Synthesis is a rigorous new approach to this class of problems. Inspired by Statistical
Low-complexity stochastic modeling of wall-bounded shear flows
NASA Astrophysics Data System (ADS)
Zare, Armin
Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.
Feature maps driven no-reference image quality prediction of authentically distorted images
NASA Astrophysics Data System (ADS)
Ghadiyaram, Deepti; Bovik, Alan C.
2015-03-01
Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.
IDENTIFICATION OF REGIME SHIFTS IN TIME SERIES USING NEIGHBORHOOD STATISTICS
The identification of alternative dynamic regimes in ecological systems requires several lines of evidence. Previous work on time series analysis of dynamic regimes includes mainly model-fitting methods. We introduce two methods that do not use models. These approaches use state-...
A practical approach for the scale-up of roller compaction process.
Shi, Weixian; Sprockel, Omar L
2016-09-01
An alternative approach for the scale-up of ribbon formation during roller compaction was investigated, which required only one batch at the commercial scale to set the operational conditions. The scale-up of ribbon formation was based on a probability method. It was sufficient in describing the mechanism of ribbon formation at both scales. In this method, a statistical relationship between roller compaction parameters and ribbon attributes (thickness and density) was first defined with DoE using a pilot Alexanderwerk WP120 roller compactor. While the milling speed was included in the design, it has no practical effect on granule properties within the study range despite its statistical significance. The statistical relationship was then adapted to a commercial Alexanderwerk WP200 roller compactor with one experimental run. The experimental run served as a calibration of the statistical model parameters. The proposed transfer method was then confirmed by conducting a mapping study on the Alexanderwerk WP200 using a factorial DoE, which showed a match between the predictions and the verification experiments. The study demonstrates the applicability of the roller compaction transfer method using the statistical model from the development scale calibrated with one experiment point at the commercial scale. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Urrego-Blanco, Jorge R.; Hunke, Elizabeth C.; Urban, Nathan M.
Here, we implement a variance-based distance metric (D n) to objectively assess skill of sea ice models when multiple output variables or uncertainties in both model predictions and observations need to be considered. The metric compares observations and model data pairs on common spatial and temporal grids improving upon highly aggregated metrics (e.g., total sea ice extent or volume) by capturing the spatial character of model skill. The D n metric is a gamma-distributed statistic that is more general than the χ 2 statistic commonly used to assess model fit, which requires the assumption that the model is unbiased andmore » can only incorporate observational error in the analysis. The D n statistic does not assume that the model is unbiased, and allows the incorporation of multiple observational data sets for the same variable and simultaneously for different variables, along with different types of variances that can characterize uncertainties in both observations and the model. This approach represents a step to establish a systematic framework for probabilistic validation of sea ice models. The methodology is also useful for model tuning by using the D n metric as a cost function and incorporating model parametric uncertainty as part of a scheme to optimize model functionality. We apply this approach to evaluate different configurations of the standalone Los Alamos sea ice model (CICE) encompassing the parametric uncertainty in the model, and to find new sets of model configurations that produce better agreement than previous configurations between model and observational estimates of sea ice concentration and thickness.« less
Urrego-Blanco, Jorge R.; Hunke, Elizabeth C.; Urban, Nathan M.; ...
2017-04-01
Here, we implement a variance-based distance metric (D n) to objectively assess skill of sea ice models when multiple output variables or uncertainties in both model predictions and observations need to be considered. The metric compares observations and model data pairs on common spatial and temporal grids improving upon highly aggregated metrics (e.g., total sea ice extent or volume) by capturing the spatial character of model skill. The D n metric is a gamma-distributed statistic that is more general than the χ 2 statistic commonly used to assess model fit, which requires the assumption that the model is unbiased andmore » can only incorporate observational error in the analysis. The D n statistic does not assume that the model is unbiased, and allows the incorporation of multiple observational data sets for the same variable and simultaneously for different variables, along with different types of variances that can characterize uncertainties in both observations and the model. This approach represents a step to establish a systematic framework for probabilistic validation of sea ice models. The methodology is also useful for model tuning by using the D n metric as a cost function and incorporating model parametric uncertainty as part of a scheme to optimize model functionality. We apply this approach to evaluate different configurations of the standalone Los Alamos sea ice model (CICE) encompassing the parametric uncertainty in the model, and to find new sets of model configurations that produce better agreement than previous configurations between model and observational estimates of sea ice concentration and thickness.« less
A Flexible Approach for the Statistical Visualization of Ensemble Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potter, K.; Wilson, A.; Bremer, P.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Quantifying variation in speciation and extinction rates with clade data.
Paradis, Emmanuel; Tedesco, Pablo A; Hugueny, Bernard
2013-12-01
High-level phylogenies are very common in evolutionary analyses, although they are often treated as incomplete data. Here, we provide statistical tools to analyze what we name "clade data," which are the ages of clades together with their numbers of species. We develop a general approach for the statistical modeling of variation in speciation and extinction rates, including temporal variation, unknown variation, and linear and nonlinear modeling. We show how this approach can be generalized to a wide range of situations, including testing the effects of life-history traits and environmental variables on diversification rates. We report the results of an extensive simulation study to assess the performance of some statistical tests presented here as well as of the estimators of speciation and extinction rates. These latter results suggest the possibility to estimate correctly extinction rate in the absence of fossils. An example with data on fish is presented. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.
An order statistics approach to the halo model for galaxies
NASA Astrophysics Data System (ADS)
Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.
2017-04-01
We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.
"The Two Brothers": Reconciling Perceptual-Cognitive and Statistical Models of Musical Evolution.
Jan, Steven
2018-01-01
While the "units, events and dynamics" of memetic evolution have been abstractly theorized (Lynch, 1998), they have not been applied systematically to real corpora in music. Some researchers, convinced of the validity of cultural evolution in more than the metaphorical sense adopted by much musicology, but perhaps skeptical of some or all of the claims of memetics, have attempted statistically based corpus-analysis techniques of music drawn from molecular biology, and these have offered strong evidence in favor of system-level change over time (Savage, 2017). This article argues that such statistical approaches, while illuminating, ignore the psychological realities of music-information grouping, the transmission of such groups with varying degrees of fidelity, their selection according to relative perceptual-cognitive salience, and the power of this Darwinian process to drive the systemic changes (such as the development over time of systems of tonal organization in music) that statistical methodologies measure. It asserts that a synthesis between such statistical approaches to the study of music-cultural change and the theory of memetics as applied to music (Jan, 2007), in particular the latter's perceptual-cognitive elements, would harness the strengths of each approach and deepen understanding of cultural evolution in music.
“The Two Brothers”: Reconciling Perceptual-Cognitive and Statistical Models of Musical Evolution
Jan, Steven
2018-01-01
While the “units, events and dynamics” of memetic evolution have been abstractly theorized (Lynch, 1998), they have not been applied systematically to real corpora in music. Some researchers, convinced of the validity of cultural evolution in more than the metaphorical sense adopted by much musicology, but perhaps skeptical of some or all of the claims of memetics, have attempted statistically based corpus-analysis techniques of music drawn from molecular biology, and these have offered strong evidence in favor of system-level change over time (Savage, 2017). This article argues that such statistical approaches, while illuminating, ignore the psychological realities of music-information grouping, the transmission of such groups with varying degrees of fidelity, their selection according to relative perceptual-cognitive salience, and the power of this Darwinian process to drive the systemic changes (such as the development over time of systems of tonal organization in music) that statistical methodologies measure. It asserts that a synthesis between such statistical approaches to the study of music-cultural change and the theory of memetics as applied to music (Jan, 2007), in particular the latter's perceptual-cognitive elements, would harness the strengths of each approach and deepen understanding of cultural evolution in music. PMID:29670551
Identifying ontogenetic, environmental and individual components of forest tree growth
Chaubert-Pereira, Florence; Caraglio, Yves; Lavergne, Christian; Guédon, Yann
2009-01-01
Background and Aims This study aimed to identify and characterize the ontogenetic, environmental and individual components of forest tree growth. In the proposed approach, the tree growth data typically correspond to the retrospective measurement of annual shoot characteristics (e.g. length) along the trunk. Methods Dedicated statistical models (semi-Markov switching linear mixed models) were applied to data sets of Corsican pine and sessile oak. In the semi-Markov switching linear mixed models estimated from these data sets, the underlying semi-Markov chain represents both the succession of growth phases and their lengths, while the linear mixed models represent both the influence of climatic factors and the inter-individual heterogeneity within each growth phase. Key Results On the basis of these integrative statistical models, it is shown that growth phases are not only defined by average growth level but also by growth fluctuation amplitudes in response to climatic factors and inter-individual heterogeneity and that the individual tree status within the population may change between phases. Species plasticity affected the response to climatic factors while tree origin, sampling strategy and silvicultural interventions impacted inter-individual heterogeneity. Conclusions The transposition of the proposed integrative statistical modelling approach to cambial growth in relation to climatic factors and the study of the relationship between apical growth and cambial growth constitute the next steps in this research. PMID:19684021
Slob, Wout
2006-07-01
Probabilistic dietary exposure assessments that are fully based on Monte Carlo sampling from the raw intake data may not be appropriate. This paper shows that the data should first be analysed by using a statistical model that is able to take the various dimensions of food consumption patterns into account. A (parametric) model is discussed that takes into account the interindividual variation in (daily) consumption frequencies, as well as in amounts consumed. Further, the model can be used to include covariates, such as age, sex, or other individual attributes. Some illustrative examples show how this model may be used to estimate the probability of exceeding an (acute or chronic) exposure limit. These results are compared with the results based on directly counting the fraction of observed intakes exceeding the limit value. This comparison shows that the latter method is not adequate, in particular for the acute exposure situation. A two-step approach for probabilistic (acute) exposure assessment is proposed: first analyse the consumption data by a (parametric) statistical model as discussed in this paper, and then use Monte Carlo techniques for combining the variation in concentrations with the variation in consumption (by sampling from the statistical model). This approach results in an estimate of the fraction of the population as a function of the fraction of days at which the exposure limit is exceeded by the individual.
Composite Linear Models | Division of Cancer Prevention
By Stuart G. Baker The composite linear models software is a matrix approach to compute maximum likelihood estimates and asymptotic standard errors for models for incomplete multinomial data. It implements the method described in Baker SG. Composite linear models for incomplete multinomial data. Statistics in Medicine 1994;13:609-622. The software includes a library of thirty
The mean time-limited crash rate of stock price
NASA Astrophysics Data System (ADS)
Li, Yun-Xian; Li, Jiang-Cheng; Yang, Ai-Jun; Tang, Nian-Sheng
2017-05-01
In this article we investigate the occurrence of stock market crash in an economy cycle. Bayesian approach, Heston model and statistical-physical method are considered. Specifically, Heston model and an effective potential are employed to address the dynamic changes of stock price. Bayesian approach has been utilized to estimate the Heston model's unknown parameters. Statistical physical method is used to investigate the occurrence of stock market crash by calculating the mean time-limited crash rate. The real financial data from the Shanghai Composite Index is analyzed with the proposed methods. The mean time-limited crash rate of stock price is used to describe the occurrence of stock market crash in an economy cycle. The monotonous and nonmonotonous behaviors are observed in the behavior of the mean time-limited crash rate versus volatility of stock for various cross correlation coefficient between volatility and price. Also a minimum occurrence of stock market crash matching an optimal volatility is discovered.
Optimal experimental designs for fMRI when the model matrix is uncertain.
Kao, Ming-Hung; Zhou, Lin
2017-07-15
This study concerns optimal designs for functional magnetic resonance imaging (fMRI) experiments when the model matrix of the statistical model depends on both the selected stimulus sequence (fMRI design), and the subject's uncertain feedback (e.g. answer) to each mental stimulus (e.g. question) presented to her/him. While practically important, this design issue is challenging. This mainly is because that the information matrix cannot be fully determined at the design stage, making it difficult to evaluate the quality of the selected designs. To tackle this challenging issue, we propose an easy-to-use optimality criterion for evaluating the quality of designs, and an efficient approach for obtaining designs optimizing this criterion. Compared with a previously proposed method, our approach requires a much less computing time to achieve designs with high statistical efficiencies. Copyright © 2017 Elsevier Inc. All rights reserved.
Development of a Stochastically-driven, Forward Predictive Performance Model for PEMFCs
NASA Astrophysics Data System (ADS)
Harvey, David Benjamin Paul
A one-dimensional multi-scale coupled, transient, and mechanistic performance model for a PEMFC membrane electrode assembly has been developed. The model explicitly includes each of the 5 layers within a membrane electrode assembly and solves for the transport of charge, heat, mass, species, dissolved water, and liquid water. Key features of the model include the use of a multi-step implementation of the HOR reaction on the anode, agglomerate catalyst sub-models for both the anode and cathode catalyst layers, a unique approach that links the composition of the catalyst layer to key properties within the agglomerate model and the implementation of a stochastic input-based approach for component material properties. The model employs a new methodology for validation using statistically varying input parameters and statistically-based experimental performance data; this model represents the first stochastic input driven unit cell performance model. The stochastic input driven performance model was used to identify optimal ionomer content within the cathode catalyst layer, demonstrate the role of material variation in potential low performing MEA materials, provide explanation for the performance of low-Pt loaded MEAs, and investigate the validity of transient-sweep experimental diagnostic methods.
Majumdar, Satya N
2003-08-01
We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
NASA Astrophysics Data System (ADS)
Majumdar, Satya N.
2003-08-01
We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
Sutton, Steven C; Hu, Mingxiu
2006-05-05
Many mathematical models have been proposed for establishing an in vitro/in vivo correlation (IVIVC). The traditional IVIVC model building process consists of 5 steps: deconvolution, model fitting, convolution, prediction error evaluation, and cross-validation. This is a time-consuming process and typically a few models at most are tested for any given data set. The objectives of this work were to (1) propose a statistical tool to screen models for further development of an IVIVC, (2) evaluate the performance of each model under different circumstances, and (3) investigate the effectiveness of common statistical model selection criteria for choosing IVIVC models. A computer program was developed to explore which model(s) would be most likely to work well with a random variation from the original formulation. The process used Monte Carlo simulation techniques to build IVIVC models. Data-based model selection criteria (Akaike Information Criteria [AIC], R2) and the probability of passing the Food and Drug Administration "prediction error" requirement was calculated. To illustrate this approach, several real data sets representing a broad range of release profiles are used to illustrate the process and to demonstrate the advantages of this automated process over the traditional approach. The Hixson-Crowell and Weibull models were often preferred over the linear. When evaluating whether a Level A IVIVC model was possible, the model selection criteria AIC generally selected the best model. We believe that the approach we proposed may be a rapid tool to determine which IVIVC model (if any) is the most applicable.
Alignment-free sequence comparison (II): theoretical power of comparison statistics.
Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S
2010-11-01
Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be downloaded from http://meta.cmb.usc.edu/d2. Supplementary Material is available at www.liebertonline.com/cmb.
2012-01-01
discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI models such as, the...detection and discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI...Shubitidze of Sky Research and Dartmouth College, conceived, implemented , and tested most of the approaches presented in this report. He developed
Semiclassical matrix model for quantum chaotic transport with time-reversal symmetry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Novaes, Marcel, E-mail: marcel.novaes@gmail.com
2015-10-15
We show that the semiclassical approach to chaotic quantum transport in the presence of time-reversal symmetry can be described by a matrix model. In other words, we construct a matrix integral whose perturbative expansion satisfies the semiclassical diagrammatic rules for the calculation of transport statistics. One of the virtues of this approach is that it leads very naturally to the semiclassical derivation of universal predictions from random matrix theory.
Efficient statistical tests to compare Youden index: accounting for contingency correlation.
Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan
2015-04-30
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
Quantification of downscaled precipitation uncertainties via Bayesian inference
NASA Astrophysics Data System (ADS)
Nury, A. H.; Sharma, A.; Marshall, L. A.
2017-12-01
Prediction of precipitation from global climate model (GCM) outputs remains critical to decision-making in water-stressed regions. In this regard, downscaling of GCM output has been a useful tool for analysing future hydro-climatological states. Several downscaling approaches have been developed for precipitation downscaling, including those using dynamical or statistical downscaling methods. Frequently, outputs from dynamical downscaling are not readily transferable across regions for significant methodical and computational difficulties. Statistical downscaling approaches provide a flexible and efficient alternative, providing hydro-climatological outputs across multiple temporal and spatial scales in many locations. However these approaches are subject to significant uncertainty, arising due to uncertainty in the downscaled model parameters and in the use of different reanalysis products for inferring appropriate model parameters. Consequently, these will affect the performance of simulation in catchment scale. This study develops a Bayesian framework for modelling downscaled daily precipitation from GCM outputs. This study aims to introduce uncertainties in downscaling evaluating reanalysis datasets against observational rainfall data over Australia. In this research a consistent technique for quantifying downscaling uncertainties by means of Bayesian downscaling frame work has been proposed. The results suggest that there are differences in downscaled precipitation occurrences and extremes.
Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models
ERIC Educational Resources Information Center
Sulis, Isabella; Toland, Michael D.
2017-01-01
Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
Systematic Error Modeling and Bias Estimation
Zhang, Feihu; Knoll, Alois
2016-01-01
This paper analyzes the statistic properties of the systematic error in terms of range and bearing during the transformation process. Furthermore, we rely on a weighted nonlinear least square method to calculate the biases based on the proposed models. The results show the high performance of the proposed approach for error modeling and bias estimation. PMID:27213386
Goodness-of-Fit Assessment of Item Response Theory Models
ERIC Educational Resources Information Center
Maydeu-Olivares, Alberto
2013-01-01
The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…
Shet, Vinayaka B; Palan, Anusha M; Rao, Shama U; Varun, C; Aishwarya, Uday; Raja, Selvaraj; Goveas, Louella Concepta; Vaman Rao, C; Ujwal, P
2018-02-01
In the current investigation, statistical approaches were adopted to hydrolyse non-edible seed cake (NESC) of Pongamia and optimize the hydrolysis process by response surface methodology (RSM). Through the RSM approach, the optimized conditions were found to be 1.17%v/v of HCl concentration at 54.12 min for hydrolysis. Under optimized conditions, the release of reducing sugars was found to be 53.03 g/L. The RSM data were used to train the artificial neural network (ANN) and the predictive ability of both models was compared by calculating various statistical parameters. A three-layered ANN model consisting of 2:12:1 topology was developed; the response of the ANN model indicates that it is precise when compared with the RSM model. The fit of the models was expressed with the regression coefficient R 2 , which was found to be 0.975 and 0.888, respectively, for the ANN and RSM models. This further demonstrated that the performance of ANN was better than that of RSM.
NASA Astrophysics Data System (ADS)
Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo
2017-05-01
In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.
Assessment of corneal properties based on statistical modeling of OCT speckle
Jesus, Danilo A.; Iskander, D. Robert
2016-01-01
A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409
Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments
Dorazio, Robert; Hunter, Margaret
2015-01-01
Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
Gomez-Ramirez, Jaime; Sanz, Ricardo
2013-09-01
One of the most important scientific challenges today is the quantitative and predictive understanding of biological function. Classical mathematical and computational approaches have been enormously successful in modeling inert matter, but they may be inadequate to address inherent features of biological systems. We address the conceptual and methodological obstacles that lie in the inverse problem in biological systems modeling. We introduce a full Bayesian approach (FBA), a theoretical framework to study biological function, in which probability distributions are conditional on biophysical information that physically resides in the biological system that is studied by the scientist. Copyright © 2013 Elsevier Ltd. All rights reserved.
Nonlinear wave chaos: statistics of second harmonic fields.
Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M
2017-10-01
Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.
A Random Variable Approach to Nuclear Targeting and Survivability
DOE Office of Scientific and Technical Information (OSTI.GOV)
Undem, Halvor A.
We demonstrate a common mathematical formalism for analyzing problems in nuclear survivability and targeting. This formalism, beginning with a random variable approach, can be used to interpret past efforts in nuclear-effects analysis, including targeting analysis. It can also be used to analyze new problems brought about by the post Cold War Era, such as the potential effects of yield degradation in a permanently untested nuclear stockpile. In particular, we illustrate the formalism through four natural case studies or illustrative problems, linking these to actual past data, modeling, and simulation, and suggesting future uses. In the first problem, we illustrate themore » case of a deterministically modeled weapon used against a deterministically responding target. Classic "Cookie Cutter" damage functions result. In the second problem, we illustrate, with actual target test data, the case of a deterministically modeled weapon used against a statistically responding target. This case matches many of the results of current nuclear targeting modeling and simulation tools, including the result of distance damage functions as complementary cumulative lognormal functions in the range variable. In the third problem, we illustrate the case of a statistically behaving weapon used against a deterministically responding target. In particular, we show the dependence of target damage on weapon yield for an untested nuclear stockpile experiencing yield degradation. Finally, and using actual unclassified weapon test data, we illustrate in the fourth problem the case of a statistically behaving weapon used against a statistically responding target.« less
Quantifying uncertainty in climate change science through empirical information theory.
Majda, Andrew J; Gershgorin, Boris
2010-08-24
Quantifying the uncertainty for the present climate and the predictions of climate change in the suite of imperfect Atmosphere Ocean Science (AOS) computer models is a central issue in climate change science. Here, a systematic approach to these issues with firm mathematical underpinning is developed through empirical information theory. An information metric to quantify AOS model errors in the climate is proposed here which incorporates both coarse-grained mean model errors as well as covariance ratios in a transformation invariant fashion. The subtle behavior of model errors with this information metric is quantified in an instructive statistically exactly solvable test model with direct relevance to climate change science including the prototype behavior of tracer gases such as CO(2). Formulas for identifying the most sensitive climate change directions using statistics of the present climate or an AOS model approximation are developed here; these formulas just involve finding the eigenvector associated with the largest eigenvalue of a quadratic form computed through suitable unperturbed climate statistics. These climate change concepts are illustrated on a statistically exactly solvable one-dimensional stochastic model with relevance for low frequency variability of the atmosphere. Viable algorithms for implementation of these concepts are discussed throughout the paper.
Sparse approximation of currents for statistics on curves and surfaces.
Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas
2008-01-01
Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.
NASA Astrophysics Data System (ADS)
Mazzitello, Karina I.; Candia, Julián
2012-12-01
In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.
Seeking parsimony in hydrology and water resources technology
NASA Astrophysics Data System (ADS)
Koutsoyiannis, D.
2009-04-01
The principle of parsimony, also known as the principle of simplicity, the principle of economy and Ockham's razor, advises scientists to prefer the simplest theory among those that fit the data equally well. In this, it is an epistemic principle but reflects an ontological characterization that the universe is ultimately parsimonious. Is this principle useful and can it really be reconciled with, and implemented to, our modelling approaches of complex hydrological systems, whose elements and events are extraordinarily numerous, different and unique? The answer underlying the mainstream hydrological research of the last two decades seems to be negative. Hopes were invested to the power of computers that would enable faithful and detailed representation of the diverse system elements and the hydrological processes, based on merely "first principles" and resulting in "physically-based" models that tend to approach in complexity the real world systems. Today the account of such research endeavour seems not positive, as it did not improve model predictive capacity and processes comprehension. A return to parsimonious modelling seems to be again the promising route. The experience from recent research and from comparisons of parsimonious and complicated models indicates that the former can facilitate insight and comprehension, improve accuracy and predictive capacity, and increase efficiency. In addition - and despite aspiration that "physically based" models will have lower data requirements and, even, they ultimately become "data-free" - parsimonious models require fewer data to achieve the same accuracy with more complicated models. Naturally, the concepts that reconcile the simplicity of parsimonious models with the complexity of hydrological systems are probability theory and statistics. Probability theory provides the theoretical basis for moving from a microscopic to a macroscopic view of phenomena, by mapping sets of diverse elements and events of hydrological systems to single numbers (a probability or an expected value), and statistics provides the empirical basis of summarizing data, making inference from them, and supporting decision making in water resource management. Unfortunately, the current state of the art in probability, statistics and their union, often called stochastics, is not fully satisfactory for the needs of modelling of hydrological and water resource systems. A first problem is that stochastic modelling has traditionally relied on classical statistics, which is based on the independent "coin-tossing" prototype, rather than on the study of real-world systems whose behaviour is very different from the classical prototype. A second problem is that the stochastic models (particularly the multivariate ones) are often not parsimonious themselves. Therefore, substantial advancement of stochastics is necessary in a new paradigm of parsimonious hydrological modelling. These ideas are illustrated using several examples, namely: (a) hydrological modelling of a karst system in Bosnia and Herzegovina using three different approaches ranging from parsimonious to detailed "physically-based"; (b) parsimonious modelling of a peculiar modified catchment in Greece; (c) a stochastic approach that can replace parameter-excessive ARMA-type models with a generalized algorithm that produces any shape of autocorrelation function (consistent with the accuracy provided by the data) using a couple of parameters; (d) a multivariate stochastic approach which replaces a huge number of parameters estimated from data with coefficients estimated by the principle of maximum entropy; and (e) a parsimonious approach for decision making in multi-reservoir systems using a handful of parameters instead of thousands of decision variables.
Modelling Social Learning in Monkeys
ERIC Educational Resources Information Center
Kendal, Jeremy R.
2008-01-01
The application of modelling to social learning in monkey populations has been a neglected topic. Recently, however, a number of statistical, simulation and analytical approaches have been developed to help examine social learning processes, putative traditions, the use of social learning strategies and the diffusion dynamics of socially…
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R
The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
A self-consistency approach to improve microwave rainfall rate estimation from space
NASA Technical Reports Server (NTRS)
Kummerow, Christian; Mack, Robert A.; Hakkarinen, Ida M.
1989-01-01
A multichannel statistical approach is used to retrieve rainfall rates from the brightness temperature T(B) observed by passive microwave radiometers flown on a high-altitude NASA aircraft. T(B) statistics are based upon data generated by a cloud radiative model. This model simulates variabilities in the underlying geophysical parameters of interest, and computes their associated T(B) in each of the available channels. By further imposing the requirement that the observed T(B) agree with the T(B) values corresponding to the retrieved parameters through the cloud radiative transfer model, the results can be made to agree quite well with coincident radar-derived rainfall rates. Some information regarding the cloud vertical structure is also obtained by such an added requirement. The applicability of this technique to satellite retrievals is also investigated. Data which might be observed by satellite-borne radiometers, including the effects of nonuniformly filled footprints, are simulated by the cloud radiative model for this purpose.
LD-SPatt: large deviations statistics for patterns on Markov chains.
Nuel, G
2004-01-01
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
Balliu, Brunilda; Tsonaka, Roula; Boehringer, Stefan; Houwing-Duistermaat, Jeanine
2015-03-01
Integrative omics, the joint analysis of outcome and multiple types of omics data, such as genomics, epigenomics, and transcriptomics data, constitute a promising approach for powerful and biologically relevant association studies. These studies often employ a case-control design, and often include nonomics covariates, such as age and gender, that may modify the underlying omics risk factors. An open question is how to best integrate multiple omics and nonomics information to maximize statistical power in case-control studies that ascertain individuals based on the phenotype. Recent work on integrative omics have used prospective approaches, modeling case-control status conditional on omics, and nonomics risk factors. Compared to univariate approaches, jointly analyzing multiple risk factors with a prospective approach increases power in nonascertained cohorts. However, these prospective approaches often lose power in case-control studies. In this article, we propose a novel statistical method for integrating multiple omics and nonomics factors in case-control association studies. Our method is based on a retrospective likelihood function that models the joint distribution of omics and nonomics factors conditional on case-control status. The new method provides accurate control of Type I error rate and has increased efficiency over prospective approaches in both simulated and real data. © 2015 Wiley Periodicals, Inc.
BOOK REVIEW: Statistical Mechanics of Turbulent Flows
NASA Astrophysics Data System (ADS)
Cambon, C.
2004-10-01
This is a handbook for a computational approach to reacting flows, including background material on statistical mechanics. In this sense, the title is somewhat misleading with respect to other books dedicated to the statistical theory of turbulence (e.g. Monin and Yaglom). In the present book, emphasis is placed on modelling (engineering closures) for computational fluid dynamics. The probabilistic (pdf) approach is applied to the local scalar field, motivated first by the nonlinearity of chemical source terms which appear in the transport equations of reacting species. The probabilistic and stochastic approaches are also used for the velocity field and particle position; nevertheless they are essentially limited to Lagrangian models for a local vector, with only single-point statistics, as for the scalar. Accordingly, conventional techniques, such as single-point closures for RANS (Reynolds-averaged Navier-Stokes) and subgrid-scale models for LES (large-eddy simulations), are described and in some cases reformulated using underlying Langevin models and filtered pdfs. Even if the theoretical approach to turbulence is not discussed in general, the essentials of probabilistic and stochastic-processes methods are described, with a useful reminder concerning statistics at the molecular level. The book comprises 7 chapters. Chapter 1 briefly states the goals and contents, with a very clear synoptic scheme on page 2. Chapter 2 presents definitions and examples of pdfs and related statistical moments. Chapter 3 deals with stochastic processes, pdf transport equations, from Kramer-Moyal to Fokker-Planck (for Markov processes), and moments equations. Stochastic differential equations are introduced and their relationship to pdfs described. This chapter ends with a discussion of stochastic modelling. The equations of fluid mechanics and thermodynamics are addressed in chapter 4. Classical conservation equations (mass, velocity, internal energy) are derived from their counterparts at the molecular level. In addition, equations are given for multicomponent reacting systems. The chapter ends with miscellaneous topics, including DNS, (idea of) the energy cascade, and RANS. Chapter 5 is devoted to stochastic models for the large scales of turbulence. Langevin-type models for velocity (and particle position) are presented, and their various consequences for second-order single-point corelations (Reynolds stress components, Kolmogorov constant) are discussed. These models are then presented for the scalar. The chapter ends with compressible high-speed flows and various models, ranging from k-epsilon to hybrid RANS-pdf. Stochastic models for small-scale turbulence are addressed in chapter 6. These models are based on the concept of a filter density function (FDF) for the scalar, and a more conventional SGS (sub-grid-scale model) for the velocity in LES. The final chapter, chapter 7, is entitled `The unification of turbulence models' and aims at reconciling large-scale and small-scale modelling. This book offers a timely survey of techniques in modern computational fluid mechanics for turbulent flows with reacting scalars. It should be of interest to engineers, while the discussion of the underlying tools, namely pdfs, stochastic and statistical equations should also be attractive to applied mathematicians and physicists. The book's emphasis on local pdfs and stochastic Langevin models gives a consistent structure to the book and allows the author to cover almost the whole spectrum of practical modelling in turbulent CFD. On the other hand, one might regret that non-local issues are not mentioned explicitly, or even briefly. These problems range from the presence of pressure-strain correlations in the Reynolds stress transport equations to the presence of two-point pdfs in the single-point pdf equation derived from the Navier--Stokes equations. (One may recall that, even without scalar transport, a general closure problem for turbulence statistics results from both non-linearity and non-locality of Navier-Stokes equations, the latter coming from, e.g., the nonlocal relationship of velocity and pressure in the quasi-incompressible case. These two aspects are often intricately linked. It is well known that non-linearity alone is not responsible for the `problem', as evidenced by 1D turbulence without pressure (`Burgulence' from the Burgers equation) and probably 3D (cosmological gas). A local description in terms of pdf for the velocity can resolve the `non-linear' problem, which instead yields an infinite hierarchy of equations in terms of moments. On the other hand, non-locality yields a hierarchy of unclosed equations, with the single-point pdf equation for velocity derived from NS incompressible equations involving a two-point pdf, and so on. The general relationship was given by Lundgren (1967, Phys. Fluids 10 (5), 969-975), with the equation for pdf at n points involving the pdf at n+1 points. The nonlocal problem appears in various statistical models which are not discussed in the book. The simplest example is full RST or ASM models, in which the closure of pressure-strain correlations is pivotal (their counterpart ought to be identified and discussed in equations (5-21) and the following ones). The book does not address more sophisticated non-local approaches, such as two-point (or spectral) non-linear closure theories and models, `rapid distortion theory' for linear regimes, not to mention scaling and intermittency based on two-point structure functions, etc. The book sometimes mixes theoretical modelling and pure empirical relationships, the empirical character coming from the lack of a nonlocal (two-point) approach.) In short, the book is orientated more towards applications than towards turbulence theory; it is written clearly and concisely and should be useful to a large community, interested either in the underlying stochastic formalism or in CFD applications.
A statistical approach to evaluate flood risk at the regional level: an application to Italy
NASA Astrophysics Data System (ADS)
Rossi, Mauro; Marchesini, Ivan; Salvati, Paola; Donnini, Marco; Guzzetti, Fausto; Sterlacchini, Simone; Zazzeri, Marco; Bonazzi, Alessandro; Carlesi, Andrea
2016-04-01
Floods are frequent and widespread in Italy, causing every year multiple fatalities and extensive damages to public and private structures. A pre-requisite for the development of mitigation schemes, including financial instruments such as insurance, is the ability to quantify their costs starting from the estimation of the underlying flood hazard. However, comprehensive and coherent information on flood prone areas, and estimates on the frequency and intensity of flood events, are not often available at scales appropriate for risk pooling and diversification. In Italy, River Basins Hydrogeological Plans (PAI), prepared by basin administrations, are the basic descriptive, regulatory, technical and operational tools for environmental planning in flood prone areas. Nevertheless, such plans do not cover the entire Italian territory, having significant gaps along the minor hydrographic network and in ungauged basins. Several process-based modelling approaches have been used by different basin administrations for the flood hazard assessment, resulting in an inhomogeneous hazard zonation of the territory. As a result, flood hazard assessments expected and damage estimations across the different Italian basin administrations are not always coherent. To overcome these limitations, we propose a simplified multivariate statistical approach for the regional flood hazard zonation coupled with a flood impact model. This modelling approach has been applied in different Italian basin administrations, allowing a preliminary but coherent and comparable estimation of the flood hazard and the relative impact. Model performances are evaluated comparing the predicted flood prone areas with the corresponding PAI zonation. The proposed approach will provide standardized information (following the EU Floods Directive specifications) on flood risk at a regional level which can in turn be more readily applied to assess flood economic impacts. Furthermore, in the assumption of an appropriate flood risk statistical characterization, the proposed procedure could be applied straightforward outside the national borders, particularly in areas with similar geo-environmental settings.
A statistical mechanics approach to autopoietic immune networks
NASA Astrophysics Data System (ADS)
Barra, Adriano; Agliari, Elena
2010-07-01
In this work we aim to bridge theoretical immunology and disordered statistical mechanics. We introduce a model for the behavior of B-cells which naturally merges the clonal selection theory and the autopoietic network theory as a whole. From the analysis of its features we recover several basic phenomena such as low-dose tolerance, dynamical memory of antigens and self/non-self discrimination.
Numerical and Qualitative Contrasts of Two Statistical Models ...
Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-
NASA Astrophysics Data System (ADS)
Zammit-Mangion, Andrew; Stavert, Ann; Rigby, Matthew; Ganesan, Anita; Rayner, Peter; Cressie, Noel
2017-04-01
The Orbiting Carbon Observatory-2 (OCO-2) satellite was launched on 2 July 2014, and it has been a source of atmospheric CO2 data since September 2014. The OCO-2 dataset contains a number of variables, but the one of most interest for flux inversion has been the column-averaged dry-air mole fraction (in units of ppm). These global level-2 data offer the possibility of inferring CO2 fluxes at Earth's surface and tracking those fluxes over time. However, as well as having a component of random error, the OCO-2 data have a component of systematic error that is dependent on the instrument's mode, namely land nadir, land glint, and ocean glint. Our statistical approach to CO2-flux inversion starts with constructing a statistical model for the random and systematic errors with parameters that can be estimated from the OCO-2 data and possibly in situ sources from flasks, towers, and the Total Column Carbon Observing Network (TCCON). Dimension reduction of the flux field is achieved through the use of physical basis functions, while temporal evolution of the flux is captured by modelling the basis-function coefficients as a vector autoregressive process. For computational efficiency, flux inversion uses only three months of sensitivities of mole fraction to changes in flux, computed using MOZART; any residual variation is captured through the modelling of a stochastic process that varies smoothly as a function of latitude. The second stage of our statistical approach is to simulate from the posterior distribution of the basis-function coefficients and all unknown parameters given the data using a fully Bayesian Markov chain Monte Carlo (MCMC) algorithm. Estimates and posterior variances of the flux field can then be obtained straightforwardly from this distribution. Our statistical approach is different than others, as it simultaneously makes inference (and quantifies uncertainty) on both the error components' parameters and the CO2 fluxes. We compare it to more classical approaches through an Observing System Simulation Experiment (OSSE) on a global scale. By changing the size of the random and systematic errors in the OSSE, we can determine the corresponding spatial and temporal resolutions at which useful flux signals could be detected from the OCO-2 data.
Variational Approach in the Theory of Liquid-Crystal State
NASA Astrophysics Data System (ADS)
Gevorkyan, E. V.
2018-03-01
The variational calculus by Leonhard Euler is the basis for modern mathematics and theoretical physics. The efficiency of variational approach in statistical theory of liquid-crystal state and in general case in condensed state theory is shown. The developed approach in particular allows us to introduce correctly effective pair interactions and optimize the simple models of liquid crystals with help of realistic intermolecular potentials.
A joint source-channel distortion model for JPEG compressed images.
Sabir, Muhammad F; Sheikh, Hamid Rahim; Heath, Robert W; Bovik, Alan C
2006-06-01
The need for efficient joint source-channel coding (JSCC) is growing as new multimedia services are introduced in commercial wireless communication systems. An important component of practical JSCC schemes is a distortion model that can predict the quality of compressed digital multimedia such as images and videos. The usual approach in the JSCC literature for quantifying the distortion due to quantization and channel errors is to estimate it for each image using the statistics of the image for a given signal-to-noise ratio (SNR). This is not an efficient approach in the design of real-time systems because of the computational complexity. A more useful and practical approach would be to design JSCC techniques that minimize average distortion for a large set of images based on some distortion model rather than carrying out per-image optimizations. However, models for estimating average distortion due to quantization and channel bit errors in a combined fashion for a large set of images are not available for practical image or video coding standards employing entropy coding and differential coding. This paper presents a statistical model for estimating the distortion introduced in progressive JPEG compressed images due to quantization and channel bit errors in a joint manner. Statistical modeling of important compression techniques such as Huffman coding, differential pulse-coding modulation, and run-length coding are included in the model. Examples show that the distortion in terms of peak signal-to-noise ratio (PSNR) can be predicted within a 2-dB maximum error over a variety of compression ratios and bit-error rates. To illustrate the utility of the proposed model, we present an unequal power allocation scheme as a simple application of our model. Results show that it gives a PSNR gain of around 6.5 dB at low SNRs, as compared to equal power allocation.
Loop series for discrete statistical models on graphs
NASA Astrophysics Data System (ADS)
Chertkov, Michael; Chernyak, Vladimir Y.
2006-06-01
In this paper we present the derivation details, logic, and motivation for the three loop calculus introduced in Chertkov and Chernyak (2006 Phys. Rev. E 73 065102(R)). Generating functions for each of the three interrelated discrete statistical models are expressed in terms of a finite series. The first term in the series corresponds to the Bethe-Peierls belief-propagation (BP) contribution; the other terms are labelled by loops on the factor graph. All loop contributions are simple rational functions of spin correlation functions calculated within the BP approach. We discuss two alternative derivations of the loop series. One approach implements a set of local auxiliary integrations over continuous fields with the BP contribution corresponding to an integrand saddle-point value. The integrals are replaced by sums in the complementary approach, briefly explained in Chertkov and Chernyak (2006 Phys. Rev. E 73 065102(R)). Local gauge symmetry transformations that clarify an important invariant feature of the BP solution are revealed in both approaches. The individual terms change under the gauge transformation while the partition function remains invariant. The requirement for all individual terms to be nonzero only for closed loops in the factor graph (as opposed to paths with loose ends) is equivalent to fixing the first term in the series to be exactly equal to the BP contribution. Further applications of the loop calculus to problems in statistical physics, computer and information sciences are discussed.
NASA Astrophysics Data System (ADS)
Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine
2014-01-01
This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.
Effective field theory of statistical anisotropies for primordial bispectrum and gravitational waves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rostami, Tahereh; Karami, Asieh; Firouzjahi, Hassan, E-mail: t.rostami@ipm.ir, E-mail: karami@ipm.ir, E-mail: firouz@ipm.ir
2017-06-01
We present the effective field theory studies of primordial statistical anisotropies in models of anisotropic inflation. The general action in unitary gauge is presented to calculate the leading interactions between the gauge field fluctuations, the curvature perturbations and the tensor perturbations. The anisotropies in scalar power spectrum and bispectrum are calculated and the dependence of these anisotropies to EFT couplings are presented. In addition, we calculate the statistical anisotropy in tensor power spectrum and the scalar-tensor cross correlation. Our EFT approach incorporates anisotropies generated in models with non-trivial speed for the gauge field fluctuations and sound speed for scalar perturbationsmore » such as in DBI inflation.« less
Direct statistical modeling and its implications for predictive mapping in mining exploration
NASA Astrophysics Data System (ADS)
Sterligov, Boris; Gumiaux, Charles; Barbanson, Luc; Chen, Yan; Cassard, Daniel; Cherkasov, Sergey; Zolotaya, Ludmila
2010-05-01
Recent advances in geosciences make more and more multidisciplinary data available for mining exploration. This allowed developing methodologies for computing forecast ore maps from the statistical combination of such different input parameters, all based on an inverse problem theory. Numerous statistical methods (e.g. algebraic method, weight of evidence, Siris method, etc) with varying degrees of complexity in their development and implementation, have been proposed and/or adapted for ore geology purposes. In literature, such approaches are often presented through applications on natural examples and the results obtained can present specificities due to local characteristics. Moreover, though crucial for statistical computations, "minimum requirements" needed for input parameters (number of minimum data points, spatial distribution of objects, etc) are often only poorly expressed. From these, problems often arise when one has to choose between one and the other method for her/his specific question. In this study, a direct statistical modeling approach is developed in order to i) evaluate the constraints on the input parameters and ii) test the validity of different existing inversion methods. The approach particularly focused on the analysis of spatial relationships between location of points and various objects (e.g. polygons and /or polylines) which is particularly well adapted to constrain the influence of intrusive bodies - such as a granite - and faults or ductile shear-zones on spatial location of ore deposits (point objects). The method is designed in a way to insure a-dimensionality with respect to scale. In this approach, both spatial distribution and topology of objects (polygons and polylines) can be parametrized by the user (e.g. density of objects, length, surface, orientation, clustering). Then, the distance of points with respect to a given type of objects (polygons or polylines) is given using a probability distribution. The location of points is computed assuming either independency or different grades of dependency between the two probability distributions. The results show that i)polygons surface mean value, polylines length mean value, the number of objects and their clustering are critical and ii) the validity of the different tested inversion methods strongly depends on the relative importance and on the dependency between the parameters used. In addition, this combined approach of direct and inverse modeling offers an opportunity to test the robustness of the inferred distribution point laws with respect to the quality of the input data set.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models
NASA Astrophysics Data System (ADS)
Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter
Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.
Norris, Peter M; da Silva, Arlindo M
2016-07-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
NASA Technical Reports Server (NTRS)
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Norris, Peter M.; da Silva, Arlindo M.
2018-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
NASA Astrophysics Data System (ADS)
Yin, Shengwen; Yu, Dejie; Yin, Hui; Lü, Hui; Xia, Baizhan
2017-09-01
Considering the epistemic uncertainties within the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) model when it is used for the response analysis of built-up systems in the mid-frequency range, the hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis (ETFE/SEA) model is established by introducing the evidence theory. Based on the hybrid ETFE/SEA model and the sub-interval perturbation technique, the hybrid Sub-interval Perturbation and Evidence Theory-based Finite Element/Statistical Energy Analysis (SIP-ETFE/SEA) approach is proposed. In the hybrid ETFE/SEA model, the uncertainty in the SEA subsystem is modeled by a non-parametric ensemble, while the uncertainty in the FE subsystem is described by the focal element and basic probability assignment (BPA), and dealt with evidence theory. Within the hybrid SIP-ETFE/SEA approach, the mid-frequency response of interest, such as the ensemble average of the energy response and the cross-spectrum response, is calculated analytically by using the conventional hybrid FE/SEA method. Inspired by the probability theory, the intervals of the mean value, variance and cumulative distribution are used to describe the distribution characteristics of mid-frequency responses of built-up systems with epistemic uncertainties. In order to alleviate the computational burdens for the extreme value analysis, the sub-interval perturbation technique based on the first-order Taylor series expansion is used in ETFE/SEA model to acquire the lower and upper bounds of the mid-frequency responses over each focal element. Three numerical examples are given to illustrate the feasibility and effectiveness of the proposed method.
Model selection and assessment for multi-species occupancy models
Broms, Kristin M.; Hooten, Mevin B.; Fitzpatrick, Ryan M.
2016-01-01
While multi-species occupancy models (MSOMs) are emerging as a popular method for analyzing biodiversity data, formal checking and validation approaches for this class of models have lagged behind. Concurrent with the rise in application of MSOMs among ecologists, a quiet regime shift is occurring in Bayesian statistics where predictive model comparison approaches are experiencing a resurgence. Unlike single-species occupancy models that use integrated likelihoods, MSOMs are usually couched in a Bayesian framework and contain multiple levels. Standard model checking and selection methods are often unreliable in this setting and there is only limited guidance in the ecological literature for this class of models. We examined several different contemporary Bayesian hierarchical approaches for checking and validating MSOMs and applied these methods to a freshwater aquatic study system in Colorado, USA, to better understand the diversity and distributions of plains fishes. Our findings indicated distinct differences among model selection approaches, with cross-validation techniques performing the best in terms of prediction.
Martin, Jordan S; Suarez, Scott A
2017-08-01
Interest in quantifying consistent among-individual variation in primate behavior, also known as personality, has grown rapidly in recent decades. Although behavioral coding is the most frequently utilized method for assessing primate personality, limitations in current statistical practice prevent researchers' from utilizing the full potential of their coding datasets. These limitations include the use of extensive data aggregation, not modeling biologically relevant sources of individual variance during repeatability estimation, not partitioning between-individual (co)variance prior to modeling personality structure, the misuse of principal component analysis, and an over-reliance upon exploratory statistical techniques to compare personality models across populations, species, and data collection methods. In this paper, we propose a statistical framework for primate personality research designed to address these limitations. Our framework synthesizes recently developed mixed-effects modeling approaches for quantifying behavioral variation with an information-theoretic model selection paradigm for confirmatory personality research. After detailing a multi-step analytic procedure for personality assessment and model comparison, we employ this framework to evaluate seven models of personality structure in zoo-housed bonobos (Pan paniscus). We find that differences between sexes, ages, zoos, time of observation, and social group composition contributed to significant behavioral variance. Independently of these factors, however, personality nonetheless accounted for a moderate to high proportion of variance in average behavior across observational periods. A personality structure derived from past rating research receives the strongest support relative to our model set. This model suggests that personality variation across the measured behavioral traits is best described by two correlated but distinct dimensions reflecting individual differences in affiliation and sociability (Agreeableness) as well as activity level, social play, and neophilia toward non-threatening stimuli (Openness). These results underscore the utility of our framework for quantifying personality in primates and facilitating greater integration between the behavioral ecological and comparative psychological approaches to personality research. © 2017 Wiley Periodicals, Inc.
A process-based standard for the Solar Energetic Particle Event Environment
NASA Astrophysics Data System (ADS)
Gabriel, Stephen
For 10 years or more, there has been a lack of concensus on what the ISO standard model for the Solar Energetic Particle Event (SEPE) environment should be. Despite many technical discussions between the world experts in this field, it has been impossible to agree on which of the several models available should be selected as the standard. Most of these discussions at the ISO WG4 meetings and conferences, etc have centred around the differences in modelling approach between the MSU model and the several remaining models from elsewhere worldwide (mainly the USA and Europe). The topic is considered timely given the inclusion of a session on reference data sets at the Space Weather Workshop in Boulder in April 2014. The original idea of a ‘process-based’ standard was conceived by Dr Kent Tobiska as a way of getting round the problems associated with not only the presence of different models, which in themselves could have quite distinct modelling approaches but could also be based on different data sets. In essence, a process based standard approach overcomes these issues by allowing there to be more than one model and not necessarily a single standard model; however, any such model has to be completely transparent in that the data set and the modelling techniques used have to be not only to be clearly and unambiguously defined but also subject to peer review. If the model meets all of these requirements then it should be acceptable as a standard model. So how does this process-based approach resolve the differences between the existing modelling approaches for the SEPE environment and remove the impasse? In a sense, it does not remove all of the differences but only some of them; however, most importantly it will allow something which so far has been impossible without ambiguities and disagreement and that is a comparison of the results of the various models. To date one of the problems (if not the major one) in comparing the results of the various different SEPE statistical models has been caused by two things: 1) the data set and 2) the definition of an event Because unravelling the dependencies of the outputs of different statistical models on these two parameters is extremely difficult if not impossible, currently comparison of the results from the different models is also extremely difficult and can lead to controversies, especially over which model is the correct one; hence, when it comes to using these models for engineering purposes to calculate, for example, the radiation dose for a particular mission, the user, who is in all likelihood not an expert in this field, could be given two( or even more) very different environments and find it impossible to know how to select one ( or even how to compare them). What is proposed then, is a process-based standard, which in common with nearly all of the current models is composed of 3 elements, a standard data set, a standard event definition and a resulting standard event list. A standard event list is the output of this standard and can then be used with any of the existing (or indeed future) models that are based on events. This standard event list is completely traceable and transparent and represents a reference event list for all the community. When coupled with a statistical model, the results when compared will only be dependent on the statistical model and not on the data set or event definition.
A Statistical Graphical Model of the California Reservoir System
NASA Astrophysics Data System (ADS)
Taeb, A.; Reager, J. T.; Turmon, M.; Chandrasekaran, V.
2017-11-01
The recent California drought has highlighted the potential vulnerability of the state's water management infrastructure to multiyear dry intervals. Due to the high complexity of the network, dynamic storage changes in California reservoirs on a state-wide scale have previously been difficult to model using either traditional statistical or physical approaches. Indeed, although there is a significant line of research on exploring models for single (or a small number of) reservoirs, these approaches are not amenable to a system-wide modeling of the California reservoir network due to the spatial and hydrological heterogeneities of the system. In this work, we develop a state-wide statistical graphical model to characterize the dependencies among a collection of 55 major California reservoirs across the state; this model is defined with respect to a graph in which the nodes index reservoirs and the edges specify the relationships or dependencies between reservoirs. We obtain and validate this model in a data-driven manner based on reservoir volumes over the period 2003-2016. A key feature of our framework is a quantification of the effects of external phenomena that influence the entire reservoir network. We further characterize the degree to which physical factors (e.g., state-wide Palmer Drought Severity Index (PDSI), average temperature, snow pack) and economic factors (e.g., consumer price index, number of agricultural workers) explain these external influences. As a consequence of this analysis, we obtain a system-wide health diagnosis of the reservoir network as a function of PDSI.
The Standard Model in the history of the Natural Sciences, Econometrics, and the social sciences
NASA Astrophysics Data System (ADS)
Fisher, W. P., Jr.
2010-07-01
In the late 18th and early 19th centuries, scientists appropriated Newton's laws of motion as a model for the conduct of any other field of investigation that would purport to be a science. This early form of a Standard Model eventually informed the basis of analogies for the mathematical expression of phenomena previously studied qualitatively, such as cohesion, affinity, heat, light, electricity, and magnetism. James Clerk Maxwell is known for his repeated use of a formalized version of this method of analogy in lectures, teaching, and the design of experiments. Economists transferring skills learned in physics made use of the Standard Model, especially after Maxwell demonstrated the value of conceiving it in abstract mathematics instead of as a concrete and literal mechanical analogy. Haavelmo's probability approach in econometrics and R. Fisher's Statistical Methods for Research Workers brought a statistical approach to bear on the Standard Model, quietly reversing the perspective of economics and the social sciences relative to that of physics. Where physicists, and Maxwell in particular, intuited scientific method as imposing stringent demands on the quality and interrelations of data, instruments, and theory in the name of inferential and comparative stability, statistical models and methods disconnected theory from data by removing the instrument as an essential component. New possibilities for reconnecting economics and the social sciences to Maxwell's sense of the method of analogy are found in Rasch's probabilistic models for measurement.
Vanniyasingam, Thuva; Daly, Caitlin; Jin, Xuejing; Zhang, Yuan; Foster, Gary; Cunningham, Charles; Thabane, Lehana
2018-06-01
This study reviews simulation studies of discrete choice experiments to determine (i) how survey design features affect statistical efficiency, (ii) and to appraise their reporting quality. Statistical efficiency was measured using relative design (D-) efficiency, D-optimality, or D-error. For this systematic survey, we searched Journal Storage (JSTOR), Since Direct, PubMed, and OVID which included a search within EMBASE. Searches were conducted up to year 2016 for simulation studies investigating the impact of DCE design features on statistical efficiency. Studies were screened and data were extracted independently and in duplicate. Results for each included study were summarized by design characteristic. Previously developed criteria for reporting quality of simulation studies were also adapted and applied to each included study. Of 371 potentially relevant studies, 9 were found to be eligible, with several varying in study objectives. Statistical efficiency improved when increasing the number of choice tasks or alternatives; decreasing the number of attributes, attribute levels; using an unrestricted continuous "manipulator" attribute; using model-based approaches with covariates incorporating response behaviour; using sampling approaches that incorporate previous knowledge of response behaviour; incorporating heterogeneity in a model-based design; correctly specifying Bayesian priors; minimizing parameter prior variances; and using an appropriate method to create the DCE design for the research question. The simulation studies performed well in terms of reporting quality. Improvement is needed in regards to clearly specifying study objectives, number of failures, random number generators, starting seeds, and the software used. These results identify the best approaches to structure a DCE. An investigator can manipulate design characteristics to help reduce response burden and increase statistical efficiency. Since studies varied in their objectives, conclusions were made on several design characteristics, however, the validity of each conclusion was limited. Further research should be conducted to explore all conclusions in various design settings and scenarios. Additional reviews to explore other statistical efficiency outcomes and databases can also be performed to enhance the conclusions identified from this review.
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Calibration of Response Data Using MIRT Models with Simple and Mixed Structures
ERIC Educational Resources Information Center
Zhang, Jinming
2012-01-01
It is common to assume during a statistical analysis of a multiscale assessment that the assessment is composed of several unidimensional subtests or that it has simple structure. Under this assumption, the unidimensional and multidimensional approaches can be used to estimate item parameters. These two approaches are equivalent in parameter…