Sample records for models statistical

  1. Students' Emergent Articulations of Statistical Models and Modeling in Making Informal Statistical Inferences

    ERIC Educational Resources Information Center

    Braham, Hana Manor; Ben-Zvi, Dani

    2017-01-01

    A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…

  2. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  3. An R2 statistic for fixed effects in the linear mixed model.

    PubMed

    Edwards, Lloyd J; Muller, Keith E; Wolfinger, Russell D; Qaqish, Bahjat F; Schabenberger, Oliver

    2008-12-20

    Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.

  4. Comparing and combining process-based crop models and statistical models with some implications for climate change

    NASA Astrophysics Data System (ADS)

    Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram

    2017-09-01

    We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.

  5. Development of a statistical model for cervical cancer cell death with irreversible electroporation in vitro.

    PubMed

    Yang, Yongji; Moser, Michael A J; Zhang, Edwin; Zhang, Wenjun; Zhang, Bing

    2018-01-01

    The aim of this study was to develop a statistical model for cell death by irreversible electroporation (IRE) and to show that the statistic model is more accurate than the electric field threshold model in the literature using cervical cancer cells in vitro. HeLa cell line was cultured and treated with different IRE protocols in order to obtain data for modeling the statistical relationship between the cell death and pulse-setting parameters. In total, 340 in vitro experiments were performed with a commercial IRE pulse system, including a pulse generator and an electric cuvette. Trypan blue staining technique was used to evaluate cell death after 4 hours of incubation following IRE treatment. Peleg-Fermi model was used in the study to build the statistical relationship using the cell viability data obtained from the in vitro experiments. A finite element model of IRE for the electric field distribution was also built. Comparison of ablation zones between the statistical model and electric threshold model (drawn from the finite element model) was used to show the accuracy of the proposed statistical model in the description of the ablation zone and its applicability in different pulse-setting parameters. The statistical models describing the relationships between HeLa cell death and pulse length and the number of pulses, respectively, were built. The values of the curve fitting parameters were obtained using the Peleg-Fermi model for the treatment of cervical cancer with IRE. The difference in the ablation zone between the statistical model and the electric threshold model was also illustrated to show the accuracy of the proposed statistical model in the representation of ablation zone in IRE. This study concluded that: (1) the proposed statistical model accurately described the ablation zone of IRE with cervical cancer cells, and was more accurate compared with the electric field model; (2) the proposed statistical model was able to estimate the value of electric field threshold for the computer simulation of IRE in the treatment of cervical cancer; and (3) the proposed statistical model was able to express the change in ablation zone with the change in pulse-setting parameters.

  6. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  7. A two-component rain model for the prediction of attenuation statistics

    NASA Technical Reports Server (NTRS)

    Crane, R. K.

    1982-01-01

    A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.

  8. 12 CFR Appendix A to Subpart A of... - Appendix A to Subpart A of Part 327

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... pricing multipliers are derived from: • A model (the Statistical Model) that estimates the probability..., which is four basis points higher than the minimum rate. II. The Statistical Model The Statistical Model... to 1997. As a result, and as described in Table A.1, the Statistical Model is estimated using a...

  9. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  10. Central Limit Theorem for Exponentially Quasi-local Statistics of Spin Models on Cayley Graphs

    NASA Astrophysics Data System (ADS)

    Reddy, Tulasi Ram; Vadlamani, Sreekar; Yogeshwaran, D.

    2018-04-01

    Central limit theorems for linear statistics of lattice random fields (including spin models) are usually proven under suitable mixing conditions or quasi-associativity. Many interesting examples of spin models do not satisfy mixing conditions, and on the other hand, it does not seem easy to show central limit theorem for local statistics via quasi-associativity. In this work, we prove general central limit theorems for local statistics and exponentially quasi-local statistics of spin models on discrete Cayley graphs with polynomial growth. Further, we supplement these results by proving similar central limit theorems for random fields on discrete Cayley graphs taking values in a countable space, but under the stronger assumptions of α -mixing (for local statistics) and exponential α -mixing (for exponentially quasi-local statistics). All our central limit theorems assume a suitable variance lower bound like many others in the literature. We illustrate our general central limit theorem with specific examples of lattice spin models and statistics arising in computational topology, statistical physics and random networks. Examples of clustering spin models include quasi-associated spin models with fast decaying covariances like the off-critical Ising model, level sets of Gaussian random fields with fast decaying covariances like the massive Gaussian free field and determinantal point processes with fast decaying kernels. Examples of local statistics include intrinsic volumes, face counts, component counts of random cubical complexes while exponentially quasi-local statistics include nearest neighbour distances in spin models and Betti numbers of sub-critical random cubical complexes.

  11. Visualization of the variability of 3D statistical shape models by animation.

    PubMed

    Lamecker, Hans; Seebass, Martin; Lange, Thomas; Hege, Hans-Christian; Deuflhard, Peter

    2004-01-01

    Models of the 3D shape of anatomical objects and the knowledge about their statistical variability are of great benefit in many computer assisted medical applications like images analysis, therapy or surgery planning. Statistical model of shapes have successfully been applied to automate the task of image segmentation. The generation of 3D statistical shape models requires the identification of corresponding points on two shapes. This remains a difficult problem, especially for shapes of complicated topology. In order to interpret and validate variations encoded in a statistical shape model, visual inspection is of great importance. This work describes the generation and interpretation of statistical shape models of the liver and the pelvic bone.

  12. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  13. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    NASA Astrophysics Data System (ADS)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.

  14. Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

    PubMed Central

    Fiori, Simone

    2007-01-01

    Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641

  15. Physics-based statistical model and simulation method of RF propagation in urban environments

    DOEpatents

    Pao, Hsueh-Yuan; Dvorak, Steven L.

    2010-09-14

    A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.

  16. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  17. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  18. Helping Students Develop Statistical Reasoning: Implementing a Statistical Reasoning Learning Environment

    ERIC Educational Resources Information Center

    Garfield, Joan; Ben-Zvi, Dani

    2009-01-01

    This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students' statistical reasoning. This model is called a "Statistical Reasoning Learning Environment" and is built on the constructivist theory of learning.

  19. Statistical field theory of futures commodity prices

    NASA Astrophysics Data System (ADS)

    Baaquie, Belal E.; Yu, Miao

    2018-02-01

    The statistical theory of commodity prices has been formulated by Baaquie (2013). Further empirical studies of single (Baaquie et al., 2015) and multiple commodity prices (Baaquie et al., 2016) have provided strong evidence in support the primary assumptions of the statistical formulation. In this paper, the model for spot prices (Baaquie, 2013) is extended to model futures commodity prices using a statistical field theory of futures commodity prices. The futures prices are modeled as a two dimensional statistical field and a nonlinear Lagrangian is postulated. Empirical studies provide clear evidence in support of the model, with many nontrivial features of the model finding unexpected support from market data.

  20. Population activity statistics dissect subthreshold and spiking variability in V1.

    PubMed

    Bányai, Mihály; Koman, Zsombor; Orbán, Gergő

    2017-07-01

    Response variability, as measured by fluctuating responses upon repeated performance of trials, is a major component of neural responses, and its characterization is key to interpret high dimensional population recordings. Response variability and covariability display predictable changes upon changes in stimulus and cognitive or behavioral state, providing an opportunity to test the predictive power of models of neural variability. Still, there is little agreement on which model to use as a building block for population-level analyses, and models of variability are often treated as a subject of choice. We investigate two competing models, the doubly stochastic Poisson (DSP) model assuming stochasticity at spike generation, and the rectified Gaussian (RG) model tracing variability back to membrane potential variance, to analyze stimulus-dependent modulation of both single-neuron and pairwise response statistics. Using a pair of model neurons, we demonstrate that the two models predict similar single-cell statistics. However, DSP and RG models have contradicting predictions on the joint statistics of spiking responses. To test the models against data, we build a population model to simulate stimulus change-related modulations in pairwise response statistics. We use single-unit data from the primary visual cortex (V1) of monkeys to show that while model predictions for variance are qualitatively similar to experimental data, only the RG model's predictions are compatible with joint statistics. These results suggest that models using Poisson-like variability might fail to capture important properties of response statistics. We argue that membrane potential-level modeling of stochasticity provides an efficient strategy to model correlations. NEW & NOTEWORTHY Neural variability and covariability are puzzling aspects of cortical computations. For efficient decoding and prediction, models of information encoding in neural populations hinge on an appropriate model of variability. Our work shows that stimulus-dependent changes in pairwise but not in single-cell statistics can differentiate between two widely used models of neuronal variability. Contrasting model predictions with neuronal data provides hints on the noise sources in spiking and provides constraints on statistical models of population activity. Copyright © 2017 the American Physiological Society.

  1. Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

    PubMed Central

    Burr, Tom

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668

  2. Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.

    PubMed

    Burr, Tom; Skurikhin, Alexei

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.

  3. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  4. An Analysis of the Navy’s Voluntary Education Program

    DTIC Science & Technology

    2007-03-01

    NAVAL ANALYSIS VOLED STUDY .........11 1. Data .........................................11 2. Statistical Models ...........................12 3...B. EMPLOYER FINANCED GENERAL TRAINING ................31 1. Data .........................................32 2. Statistical Model...37 1. Data .........................................38 2. Statistical Model ............................38 3. Findings

  5. Variability-aware compact modeling and statistical circuit validation on SRAM test array

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Spanos, Costas J.

    2016-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose a variability-aware compact model characterization methodology based on stepwise parameter selection. Transistor I-V measurements are obtained from bit transistor accessible SRAM test array fabricated using a collaborating foundry's 28nm FDSOI technology. Our in-house customized Monte Carlo simulation bench can incorporate these statistical compact models; and simulation results on SRAM writability performance are very close to measurements in distribution estimation. Our proposed statistical compact model parameter extraction methodology also has the potential of predicting non-Gaussian behavior in statistical circuit performances through mixtures of Gaussian distributions.

  6. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  7. Statistically Modeling Individual Students' Learning over Successive Collaborative Practice Opportunities

    ERIC Educational Resources Information Center

    Olsen, Jennifer; Aleven, Vincent; Rummel, Nikol

    2017-01-01

    Within educational data mining, many statistical models capture the learning of students working individually. However, not much work has been done to extend these statistical models of individual learning to a collaborative setting, despite the effectiveness of collaborative learning activities. We extend a widely used model (the additive factors…

  8. Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.

    PubMed

    Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W

    2018-05-18

    Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.

  9. Statistical Models of At-Grade Intersection Accidents. Addendum.

    DOT National Transportation Integrated Search

    2000-03-01

    This report is an addendum to the work published in FHWA-RD-96-125 titled Statistical Models of At-Grade Intersection Accidents. The objective of both research studies was to develop statistical models of the relationship between traffic accide...

  10. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  11. Exponential order statistic models of software reliability growth

    NASA Technical Reports Server (NTRS)

    Miller, D. R.

    1985-01-01

    Failure times of a software reliabilty growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.

  12. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  13. Statistical Power of Alternative Structural Models for Comparative Effectiveness Research: Advantages of Modeling Unreliability.

    PubMed

    Coman, Emil N; Iordache, Eugen; Dierker, Lisa; Fifield, Judith; Schensul, Jean J; Suggs, Suzanne; Barbour, Russell

    2014-05-01

    The advantages of modeling the unreliability of outcomes when evaluating the comparative effectiveness of health interventions is illustrated. Adding an action-research intervention component to a regular summer job program for youth was expected to help in preventing risk behaviors. A series of simple two-group alternative structural equation models are compared to test the effect of the intervention on one key attitudinal outcome in terms of model fit and statistical power with Monte Carlo simulations. Some models presuming parameters equal across the intervention and comparison groups were underpowered to detect the intervention effect, yet modeling the unreliability of the outcome measure increased their statistical power and helped in the detection of the hypothesized effect. Comparative Effectiveness Research (CER) could benefit from flexible multi-group alternative structural models organized in decision trees, and modeling unreliability of measures can be of tremendous help for both the fit of statistical models to the data and their statistical power.

  14. Bayesian models: A statistical primer for ecologists

    USGS Publications Warehouse

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  15. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  16. Statistics of the geomagnetic secular variation for the past 5Ma

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1986-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  17. Statistics of the geomagnetic secular variation for the past 5 m.y

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1988-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  18. Testing prediction methods: Earthquake clustering versus the Poisson model

    USGS Publications Warehouse

    Michael, A.J.

    1997-01-01

    Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.

  19. A Statistical Test for Comparing Nonnested Covariance Structure Models.

    ERIC Educational Resources Information Center

    Levy, Roy; Hancock, Gregory R.

    While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…

  20. A Stochastic Model of Space-Time Variability of Mesoscale Rainfall: Statistics of Spatial Averages

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, Thomas L.

    2003-01-01

    A characteristic feature of rainfall statistics is that they depend on the space and time scales over which rain data are averaged. A previously developed spectral model of rain statistics that is designed to capture this property, predicts power law scaling behavior for the second moment statistics of area-averaged rain rate on the averaging length scale L as L right arrow 0. In the present work a more efficient method of estimating the model parameters is presented, and used to fit the model to the statistics of area-averaged rain rate derived from gridded radar precipitation data from TOGA COARE. Statistical properties of the data and the model predictions are compared over a wide range of averaging scales. An extension of the spectral model scaling relations to describe the dependence of the average fraction of grid boxes within an area containing nonzero rain (the "rainy area fraction") on the grid scale L is also explored.

  1. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2012-09-30

    data collected by Paramo and Gerlotto. The data were consistent with the Anderson model in that both the data and model had a mode in the...10.1098/rsfs.2012.0027 [published, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (submitted), “Modeling statistics of fish school

  2. Modified Likelihood-Based Item Fit Statistics for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.

    2008-01-01

    Orlando and Thissen (2000) developed an item fit statistic for binary item response theory (IRT) models known as S-X[superscript 2]. This article generalizes their statistic to polytomous unfolding models. Four alternative formulations of S-X[superscript 2] are developed for the generalized graded unfolding model (GGUM). The GGUM is a…

  3. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  4. A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839

    ERIC Educational Resources Information Center

    Cai, Li; Monroe, Scott

    2014-01-01

    We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…

  5. Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

    DTIC Science & Technology

    2015-09-30

    statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal

  6. Numerical and Qualitative Contrasts of Two Statistical Models for Water Quality Change in Tidal Waters

    EPA Science Inventory

    Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and...

  7. “Plateau”-related summary statistics are uninformative for comparing working memory models

    PubMed Central

    van den Berg, Ronald; Ma, Wei Ji

    2014-01-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235

  8. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  9. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  10. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  11. Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2015-09-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.

  12. Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2016-02-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

  13. Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy

    NASA Astrophysics Data System (ADS)

    Gil, D.; Garcia-Barnes, J.; Hernández-Sabate, A.; Marti, E.

    2010-03-01

    Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.

  14. An Examination of Statistical Power in Multigroup Dynamic Structural Equation Models

    ERIC Educational Resources Information Center

    Prindle, John J.; McArdle, John J.

    2012-01-01

    This study used statistical simulation to calculate differential statistical power in dynamic structural equation models with groups (as in McArdle & Prindle, 2008). Patterns of between-group differences were simulated to provide insight into how model parameters influence power approximations. Chi-square and root mean square error of…

  15. TinkerPlots™ Model Construction Approaches for Comparing Two Groups: Student Perspectives

    ERIC Educational Resources Information Center

    Noll, Jennifer; Kirin, Dana

    2017-01-01

    Teaching introductory statistics using curricula focused on modeling and simulation is becoming increasingly common in introductory statistics courses and touted as a more beneficial approach for fostering students' statistical thinking. Yet, surprisingly little research has been conducted to study the impact of modeling and simulation curricula…

  16. Journal of Transportation and Statistics, Vol. 3, No. 2 : special issue on the statistical analysis and modeling of automotive emissions

    DOT National Transportation Integrated Search

    2000-09-01

    This special issue of the Journal of Transportation and Statistics is devoted to the statistical analysis and modeling of automotive emissions. It contains many of the papers presented in the mini-symposium last August and also includes one additiona...

  17. Heads Up! a Calculation- & Jargon-Free Approach to Statistics

    ERIC Educational Resources Information Center

    Giese, Alan R.

    2012-01-01

    Evaluating the strength of evidence in noisy data is a critical step in scientific thinking that typically relies on statistics. Students without statistical training will benefit from heuristic models that highlight the logic of statistical analysis. The likelihood associated with various coin-tossing outcomes gives students such a model. There…

  18. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  19. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    DTIC Science & Technology

    2010-03-01

    Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical

  20. A Census of Statistics Requirements at U.S. Journalism Programs and a Model for a "Statistics for Journalism" Course

    ERIC Educational Resources Information Center

    Martin, Justin D.

    2017-01-01

    This essay presents data from a census of statistics requirements and offerings at all 4-year journalism programs in the United States (N = 369) and proposes a model of a potential course in statistics for journalism majors. The author proposes that three philosophies underlie a statistics course for journalism students. Such a course should (a)…

  1. "Plateau"-related summary statistics are uninformative for comparing working memory models.

    PubMed

    van den Berg, Ronald; Ma, Wei Ji

    2014-10-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.

  2. Analyzing longitudinal data with the linear mixed models procedure in SPSS.

    PubMed

    West, Brady T

    2009-09-01

    Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.

  3. Equilibrium statistical-thermal models in high-energy physics

    NASA Astrophysics Data System (ADS)

    Tawfik, Abdel Nasser

    2014-05-01

    We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.

  4. Improvements to an earth observing statistical performance model with applications to LWIR spectral variability

    NASA Astrophysics Data System (ADS)

    Zhao, Runchen; Ientilucci, Emmett J.

    2017-05-01

    Hyperspectral remote sensing systems provide spectral data composed of hundreds of narrow spectral bands. Spectral remote sensing systems can be used to identify targets, for example, without physical interaction. Often it is of interested to characterize the spectral variability of targets or objects. The purpose of this paper is to identify and characterize the LWIR spectral variability of targets based on an improved earth observing statistical performance model, known as the Forecasting and Analysis of Spectroradiometric System Performance (FASSP) model. FASSP contains three basic modules including a scene model, sensor model and a processing model. Instead of using mean surface reflectance only as input to the model, FASSP transfers user defined statistical characteristics of a scene through the image chain (i.e., from source to sensor). The radiative transfer model, MODTRAN, is used to simulate the radiative transfer based on user defined atmospheric parameters. To retrieve class emissivity and temperature statistics, or temperature / emissivity separation (TES), a LWIR atmospheric compensation method is necessary. The FASSP model has a method to transform statistics in the visible (ie., ELM) but currently does not have LWIR TES algorithm in place. This paper addresses the implementation of such a TES algorithm and its associated transformation of statistics.

  5. Bayesian statistics in medicine: a 25 year review.

    PubMed

    Ashby, Deborah

    2006-11-15

    This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.

  6. The epistemology of mathematical and statistical modeling: a quiet methodological revolution.

    PubMed

    Rodgers, Joseph Lee

    2010-01-01

    A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.

  7. Local sensitivity analysis for inverse problems solved by singular value decomposition

    USGS Publications Warehouse

    Hill, M.C.; Nolan, B.T.

    2010-01-01

    Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.

  8. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  9. 78 FR 70303 - Announcement of Requirements and Registration for the Predict the Influenza Season Challenge

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-11-25

    ... public. Mathematical and statistical models can be useful in predicting the timing and impact of the... applying any mathematical, statistical, or other approach to predictive modeling. This challenge will... Services (HHS) region level(s) in the United States by developing mathematical and statistical models that...

  10. Developing Statistical Knowledge for Teaching during Design-Based Research

    ERIC Educational Resources Information Center

    Groth, Randall E.

    2017-01-01

    Statistical knowledge for teaching is not precisely equivalent to statistics subject matter knowledge. Teachers must know how to make statistics understandable to others as well as understand the subject matter themselves. This dual demand on teachers calls for the development of viable teacher education models. This paper offers one such model,…

  11. Maximum entropy models of ecosystem functioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertram, Jason, E-mail: jason.bertram@anu.edu.au

    2014-12-05

    Using organism-level traits to deduce community-level relationships is a fundamental problem in theoretical ecology. This problem parallels the physical one of using particle properties to deduce macroscopic thermodynamic laws, which was successfully achieved with the development of statistical physics. Drawing on this parallel, theoretical ecologists from Lotka onwards have attempted to construct statistical mechanistic theories of ecosystem functioning. Jaynes’ broader interpretation of statistical mechanics, which hinges on the entropy maximisation algorithm (MaxEnt), is of central importance here because the classical foundations of statistical physics do not have clear ecological analogues (e.g. phase space, dynamical invariants). However, models based on themore » information theoretic interpretation of MaxEnt are difficult to interpret ecologically. Here I give a broad discussion of statistical mechanical models of ecosystem functioning and the application of MaxEnt in these models. Emphasising the sample frequency interpretation of MaxEnt, I show that MaxEnt can be used to construct models of ecosystem functioning which are statistical mechanical in the traditional sense using a savanna plant ecology model as an example.« less

  12. The crossing statistic: dealing with unknown errors in the dispersion of Type Ia supernovae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman; Clifton, Timothy; Ferreira, Pedro, E-mail: arman@ewha.ac.kr, E-mail: tclifton@astro.ox.ac.uk, E-mail: p.ferreira1@physics.ox.ac.uk

    2011-08-01

    We propose a new statistic that has been designed to be used in situations where the intrinsic dispersion of a data set is not well known: The Crossing Statistic. This statistic is in general less sensitive than χ{sup 2} to the intrinsic dispersion of the data, and hence allows us to make progress in distinguishing between different models using goodness of fit to the data even when the errors involved are poorly understood. The proposed statistic makes use of the shape and trends of a model's predictions in a quantifiable manner. It is applicable to a variety of circumstances, althoughmore » we consider it to be especially well suited to the task of distinguishing between different cosmological models using type Ia supernovae. We show that this statistic can easily distinguish between different models in cases where the χ{sup 2} statistic fails. We also show that the last mode of the Crossing Statistic is identical to χ{sup 2}, so that it can be considered as a generalization of χ{sup 2}.« less

  13. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  14. Network Data: Statistical Theory and New Models

    DTIC Science & Technology

    2016-02-17

    SECURITY CLASSIFICATION OF: During this period of review, Bin Yu worked on many thrusts of high-dimensional statistical theory and methodologies. Her...research covered a wide range of topics in statistics including analysis and methods for spectral clustering for sparse and structured networks...2,7,8,21], sparse modeling (e.g. Lasso) [4,10,11,17,18,19], statistical guarantees for the EM algorithm [3], statistical analysis of algorithm leveraging

  15. Evaluating statistical consistency in the ocean model component of the Community Earth System Model (pyCECT v2.0)

    NASA Astrophysics Data System (ADS)

    Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen

    2016-07-01

    The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.

  16. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  17. Different Manhattan project: automatic statistical model generation

    NASA Astrophysics Data System (ADS)

    Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

    2002-03-01

    We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.

  18. Identifiability of PBPK Models with Applications to ...

    EPA Pesticide Factsheets

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy

  19. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    published 3-D multi-beam data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were...submitted, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D

  20. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were consistent with the...Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D data from a

  1. Statistical Compression for Climate Model Output

    NASA Astrophysics Data System (ADS)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  2. Crash Lethality Model

    DTIC Science & Technology

    2012-06-06

    Statistical Data ........................................................................................... 45 31 Parametric Model for Rotor Wing Debris...Area .............................................................. 46 32 Skid Distance Statistical Data...results. The curve that related the BC value to the probability of skull fracture resulted in a tight confidence interval and a two tailed statistical p

  3. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  4. Statistical Parameter Study of the Time Interval Distribution for Nonparalyzable, Paralyzable, and Hybrid Dead Time Models

    NASA Astrophysics Data System (ADS)

    Syam, Nur Syamsi; Maeng, Seongjin; Kim, Myo Gwang; Lim, Soo Yeon; Lee, Sang Hoon

    2018-05-01

    A large dead time of a Geiger Mueller (GM) detector may cause a large count loss in radiation measurements and consequently may cause distortion of the Poisson statistic of radiation events into a new distribution. The new distribution will have different statistical parameters compared to the original distribution. Therefore, the variance, skewness, and excess kurtosis in association with the observed count rate of the time interval distribution for well-known nonparalyzable, paralyzable, and nonparalyzable-paralyzable hybrid dead time models of a Geiger Mueller detector were studied using Monte Carlo simulation (GMSIM). These parameters were then compared with the statistical parameters of a perfect detector to observe the change in the distribution. The results show that the behaviors of the statistical parameters for the three dead time models were different. The values of the skewness and the excess kurtosis of the nonparalyzable model are equal or very close to those of the perfect detector, which are ≅2 for skewness, and ≅6 for excess kurtosis, while the statistical parameters in the paralyzable and hybrid model obtain minimum values that occur around the maximum observed count rates. The different trends of the three models resulting from the GMSIM simulation can be used to distinguish the dead time behavior of a GM counter; i.e. whether the GM counter can be described best by using the nonparalyzable, paralyzable, or hybrid model. In a future study, these statistical parameters need to be analyzed further to determine the possibility of using them to determine a dead time for each model, particularly for paralyzable and hybrid models.

  5. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  6. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  7. Statistical aspects of carbon fiber risk assessment modeling. [fire accidents involving aircraft

    NASA Technical Reports Server (NTRS)

    Gross, D.; Miller, D. R.; Soland, R. M.

    1980-01-01

    The probabilistic and statistical aspects of the carbon fiber risk assessment modeling of fire accidents involving commercial aircraft are examined. Three major sources of uncertainty in the modeling effort are identified. These are: (1) imprecise knowledge in establishing the model; (2) parameter estimation; and (3)Monte Carlo sampling error. All three sources of uncertainty are treated and statistical procedures are utilized and/or developed to control them wherever possible.

  8. A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pasqualini, Donatella

    This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimatedmore » stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.« less

  9. Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

    NASA Astrophysics Data System (ADS)

    Eroglu, Sertac

    2014-10-01

    The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.

  10. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition.

    PubMed

    Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2014-06-05

    In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.

  11. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition

    PubMed Central

    2014-01-01

    Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483

  12. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  13. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  14. Making statistical inferences about software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1988-01-01

    Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.

  15. Using the Expectancy Value Model of Motivation to Understand the Relationship between Student Attitudes and Achievement in Statistics

    ERIC Educational Resources Information Center

    Hood, Michelle; Creed, Peter A.; Neumann, David L.

    2012-01-01

    We tested a model of the relationship between attitudes toward statistics and achievement based on Eccles' Expectancy Value Model (1983). Participants (n = 149; 83% female) were second-year Australian university students in a psychology statistics course (mean age = 23.36 years, SD = 7.94 years). We obtained demographic details, past performance,…

  16. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    USGS Publications Warehouse

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  17. The statistical average of optical properties for alumina particle cluster in aircraft plume

    NASA Astrophysics Data System (ADS)

    Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin

    2018-04-01

    We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.

  18. Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

    PubMed

    Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

    2016-05-01

    Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.

  19. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  20. An adaptive state of charge estimation approach for lithium-ion series-connected battery system

    NASA Astrophysics Data System (ADS)

    Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael

    2018-07-01

    Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.

  1. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  2. Statistical Reform in School Psychology Research: A Synthesis

    ERIC Educational Resources Information Center

    Swaminathan, Hariharan; Rogers, H. Jane

    2007-01-01

    Statistical reform in school psychology research is discussed in terms of research designs, measurement issues, statistical modeling and analysis procedures, interpretation and reporting of statistical results, and finally statistics education.

  3. Nonelastic nuclear reactions and accompanying gamma radiation

    NASA Technical Reports Server (NTRS)

    Snow, R.; Rosner, H. R.; George, M. C.; Hayes, J. D.

    1971-01-01

    Several aspects of nonelastic nuclear reactions which proceed through the formation of a compound nucleus are dealt with. The full statistical model and the partial statistical model are described and computer programs based on these models are presented along with operating instructions and input and output for sample problems. A theoretical development of the expression for the reaction cross section for the hybrid case which involves a combination of the continuum aspects of the full statistical model with the discrete level aspects of the partial statistical model is presented. Cross sections for level excitation and gamma production by neutron inelastic scattering from the nuclei Al-27, Fe-56, Si-28, and Pb-208 are calculated and compared with avaliable experimental data.

  4. Multiple commodities in statistical microeconomics: Model and market

    NASA Astrophysics Data System (ADS)

    Baaquie, Belal E.; Yu, Miao; Du, Xin

    2016-11-01

    A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.

  5. Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

    PubMed Central

    Colegrave, Nick

    2017-01-01

    A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912

  6. A statistical model of operational impacts on the framework of the bridge crane

    NASA Astrophysics Data System (ADS)

    Antsev, V. Yu; Tolokonnikov, A. S.; Gorynin, A. D.; Reutov, A. A.

    2017-02-01

    The technical regulations of the Customs Union demands implementation of the risk analysis of the bridge cranes operation at their design stage. The statistical model has been developed for performance of random calculations of risks, allowing us to model possible operational influences on the bridge crane metal structure in their various combination. The statistical model is practically actualized in the software product automated calculation of risks of failure occurrence of bridge cranes.

  7. Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation.

    PubMed

    Pearce, Marcus T

    2018-05-11

    Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.

  8. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  9. Spatial Statistical Network Models for Stream and River Temperature in the Chesapeake Bay Watershed, USA

    EPA Science Inventory

    Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...

  10. Statistical models and NMR analysis of polymer microstructure

    USDA-ARS?s Scientific Manuscript database

    Statistical models can be used in conjunction with NMR spectroscopy to study polymer microstructure and polymerization mechanisms. Thus, Bernoullian, Markovian, and enantiomorphic-site models are well known. Many additional models have been formulated over the years for additional situations. Typica...

  11. Use of statistical and neural net approaches in predicting toxicity of chemicals.

    PubMed

    Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

    2000-01-01

    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.

  12. Fast, Statistical Model of Surface Roughness for Ion-Solid Interaction Simulations and Efficient Code Coupling

    NASA Astrophysics Data System (ADS)

    Drobny, Jon; Curreli, Davide; Ruzic, David; Lasa, Ane; Green, David; Canik, John; Younkin, Tim; Blondel, Sophie; Wirth, Brian

    2017-10-01

    Surface roughness greatly impacts material erosion, and thus plays an important role in Plasma-Surface Interactions. Developing strategies for efficiently introducing rough surfaces into ion-solid interaction codes will be an important step towards whole-device modeling of plasma devices and future fusion reactors such as ITER. Fractal TRIDYN (F-TRIDYN) is an upgraded version of the Monte Carlo, BCA program TRIDYN developed for this purpose that includes an explicit fractal model of surface roughness and extended input and output options for file-based code coupling. Code coupling with both plasma and material codes has been achieved and allows for multi-scale, whole-device modeling of plasma experiments. These code coupling results will be presented. F-TRIDYN has been further upgraded with an alternative, statistical model of surface roughness. The statistical model is significantly faster than and compares favorably to the fractal model. Additionally, the statistical model compares well to alternative computational surface roughness models and experiments. Theoretical links between the fractal and statistical models are made, and further connections to experimental measurements of surface roughness are explored. This work was supported by the PSI-SciDAC Project funded by the U.S. Department of Energy through contract DOE-DE-SC0008658.

  13. Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.

    PubMed

    Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret

    2005-01-01

    Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.

  14. Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

    NASA Astrophysics Data System (ADS)

    Cianciara, Aleksander

    2016-09-01

    The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.

  15. Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

    NASA Astrophysics Data System (ADS)

    Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

    2016-12-01

    Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.

  16. Modeling Soot Oxidation and Gasification with Bayesian Statistics

    DOE PAGES

    Josephson, Alexander J.; Gaffin, Neal D.; Smith, Sean T.; ...

    2017-08-22

    This paper presents a statistical method for model calibration using data collected from literature. The method is used to calibrate parameters for global models of soot consumption in combustion systems. This consumption is broken into two different submodels: first for oxidation where soot particles are attacked by certain oxidizing agents; second for gasification where soot particles are attacked by H 2O or CO 2 molecules. Rate data were collected from 19 studies in the literature and evaluated using Bayesian statistics to calibrate the model parameters. Bayesian statistics are valued in their ability to quantify uncertainty in modeling. The calibrated consumptionmore » model with quantified uncertainty is presented here along with a discussion of associated implications. The oxidation results are found to be consistent with previous studies. Significant variation is found in the CO 2 gasification rates.« less

  17. Modeling Soot Oxidation and Gasification with Bayesian Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Josephson, Alexander J.; Gaffin, Neal D.; Smith, Sean T.

    This paper presents a statistical method for model calibration using data collected from literature. The method is used to calibrate parameters for global models of soot consumption in combustion systems. This consumption is broken into two different submodels: first for oxidation where soot particles are attacked by certain oxidizing agents; second for gasification where soot particles are attacked by H 2O or CO 2 molecules. Rate data were collected from 19 studies in the literature and evaluated using Bayesian statistics to calibrate the model parameters. Bayesian statistics are valued in their ability to quantify uncertainty in modeling. The calibrated consumptionmore » model with quantified uncertainty is presented here along with a discussion of associated implications. The oxidation results are found to be consistent with previous studies. Significant variation is found in the CO 2 gasification rates.« less

  18. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  19. Statistical Design Model (SDM) of satellite thermal control subsystem

    NASA Astrophysics Data System (ADS)

    Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi

    2016-07-01

    Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware

  20. Assessing risk factors for dental caries: a statistical modeling approach.

    PubMed

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  1. Advances in statistics

    Treesearch

    Howard Stauffer; Nadav Nur

    2005-01-01

    The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...

  2. Probability density function shape sensitivity in the statistical modeling of turbulent particle dispersion

    NASA Technical Reports Server (NTRS)

    Litchford, Ron J.; Jeng, San-Mou

    1992-01-01

    The performance of a recently introduced statistical transport model for turbulent particle dispersion is studied here for rigid particles injected into a round turbulent jet. Both uniform and isosceles triangle pdfs are used. The statistical sensitivity to parcel pdf shape is demonstrated.

  3. A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

    PubMed

    Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

    2014-01-01

    We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.

  4. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  5. Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

    This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less

  6. Statistical considerations on prognostic models for glioma

    PubMed Central

    Molinaro, Annette M.; Wrensch, Margaret R.; Jenkins, Robert B.; Eckel-Passow, Jeanette E.

    2016-01-01

    Given the lack of beneficial treatments in glioma, there is a need for prognostic models for therapeutic decision making and life planning. Recently several studies defining subtypes of glioma have been published. Here, we review the statistical considerations of how to build and validate prognostic models, explain the models presented in the current glioma literature, and discuss advantages and disadvantages of each model. The 3 statistical considerations to establishing clinically useful prognostic models are: study design, model building, and validation. Careful study design helps to ensure that the model is unbiased and generalizable to the population of interest. During model building, a discovery cohort of patients can be used to choose variables, construct models, and estimate prediction performance via internal validation. Via external validation, an independent dataset can assess how well the model performs. It is imperative that published models properly detail the study design and methods for both model building and validation. This provides readers the information necessary to assess the bias in a study, compare other published models, and determine the model's clinical usefulness. As editors, reviewers, and readers of the relevant literature, we should be cognizant of the needed statistical considerations and insist on their use. PMID:26657835

  7. Flexible statistical modelling detects clinical functional magnetic resonance imaging activation in partially compliant subjects.

    PubMed

    Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D

    2007-02-01

    Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.

  8. Combining Statistics and Physics to Improve Climate Downscaling

    NASA Astrophysics Data System (ADS)

    Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

    2017-12-01

    Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.

  9. Linearised and non-linearised isotherm models optimization analysis by error functions and statistical means

    PubMed Central

    2014-01-01

    In adsorption study, to describe sorption process and evaluation of best-fitting isotherm model is a key analysis to investigate the theoretical hypothesis. Hence, numerous statistically analysis have been extensively used to estimate validity of the experimental equilibrium adsorption values with the predicted equilibrium values. Several statistical error analysis were carried out. In the present study, the following statistical analysis were carried out to evaluate the adsorption isotherm model fitness, like the Pearson correlation, the coefficient of determination and the Chi-square test, have been used. The ANOVA test was carried out for evaluating significance of various error functions and also coefficient of dispersion were evaluated for linearised and non-linearised models. The adsorption of phenol onto natural soil (Local name Kalathur soil) was carried out, in batch mode at 30 ± 20 C. For estimating the isotherm parameters, to get a holistic view of the analysis the models were compared between linear and non-linear isotherm models. The result reveled that, among above mentioned error functions and statistical functions were designed to determine the best fitting isotherm. PMID:25018878

  10. Moment-Based Physical Models of Broadband Clutter due to Aggregations of Fish

    DTIC Science & Technology

    2013-09-30

    statistical models for signal-processing algorithm development. These in turn will help to develop a capability to statistically forecast the impact of...aggregations of fish based on higher-order statistical measures describable in terms of physical and system parameters. Environmentally , these models...processing. In this experiment, we had good ground truth on (1) and (2), and had control over (3) and (4) except for environmentally -imposed restrictions

  11. Interpretation of the results of statistical measurements. [search for basic probability model

    NASA Technical Reports Server (NTRS)

    Olshevskiy, V. V.

    1973-01-01

    For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.

  12. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    PubMed

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  13. Children's Services Statistical Neighbour Benchmarking Tool. Practitioner User Guide

    ERIC Educational Resources Information Center

    National Foundation for Educational Research, 2007

    2007-01-01

    Statistical neighbour models provide one method for benchmarking progress. For each local authority (LA), these models designate a number of other LAs deemed to have similar characteristics. These designated LAs are known as statistical neighbours. Any LA may compare its performance (as measured by various indicators) against its statistical…

  14. The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes

    ERIC Educational Resources Information Center

    Cartier, Stephen F.

    2011-01-01

    A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…

  15. A Model of Statistics Performance Based on Achievement Goal Theory.

    ERIC Educational Resources Information Center

    Bandalos, Deborah L.; Finney, Sara J.; Geske, Jenenne A.

    2003-01-01

    Tests a model of statistics performance based on achievement goal theory. Both learning and performance goals affected achievement indirectly through study strategies, self-efficacy, and test anxiety. Implications of these findings for teaching and learning statistics are discussed. (Contains 47 references, 3 tables, 3 figures, and 1 appendix.)…

  16. [Statistical prediction methods in violence risk assessment and its application].

    PubMed

    Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song

    2013-06-01

    It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.

  17. Non-equilibrium dog-flea model

    NASA Astrophysics Data System (ADS)

    Ackerson, Bruce J.

    2017-11-01

    We develop the open dog-flea model to serve as a check of proposed non-equilibrium theories of statistical mechanics. The model is developed in detail. Then it is applied to four recent models for non-equilibrium statistical mechanics. Comparison of the dog-flea solution with these different models allows checking claims and giving a concrete example of the theoretical models.

  18. Analysis and meta-analysis of single-case designs: an introduction.

    PubMed

    Shadish, William R

    2014-04-01

    The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  19. Statistical Ensemble of Large Eddy Simulations

    NASA Technical Reports Server (NTRS)

    Carati, Daniele; Rogers, Michael M.; Wray, Alan A.; Mansour, Nagi N. (Technical Monitor)

    2001-01-01

    A statistical ensemble of large eddy simulations (LES) is run simultaneously for the same flow. The information provided by the different large scale velocity fields is used to propose an ensemble averaged version of the dynamic model. This produces local model parameters that only depend on the statistical properties of the flow. An important property of the ensemble averaged dynamic procedure is that it does not require any spatial averaging and can thus be used in fully inhomogeneous flows. Also, the ensemble of LES's provides statistics of the large scale velocity that can be used for building new models for the subgrid-scale stress tensor. The ensemble averaged dynamic procedure has been implemented with various models for three flows: decaying isotropic turbulence, forced isotropic turbulence, and the time developing plane wake. It is found that the results are almost independent of the number of LES's in the statistical ensemble provided that the ensemble contains at least 16 realizations.

  20. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  1. Modeling the sound transmission between rooms coupled through partition walls by using a diffusion model.

    PubMed

    Billon, Alexis; Foy, Cédric; Picaut, Judicaël; Valeau, Vincent; Sakout, Anas

    2008-06-01

    In this paper, a modification of the diffusion model for room acoustics is proposed to account for sound transmission between two rooms, a source room and an adjacent room, which are coupled through a partition wall. A system of two diffusion equations, one for each room, together with a set of two boundary conditions, one for the partition wall and one for the other walls of a room, is obtained and numerically solved. The modified diffusion model is validated by numerical comparisons with the statistical theory for several coupled-room configurations by varying the coupling area surface, the absorption coefficient of each room, and the volume of the adjacent room. An experimental comparison is also carried out for two coupled classrooms. The modified diffusion model results agree very well with both the statistical theory and the experimental data. The diffusion model can then be used as an alternative to the statistical theory, especially when the statistical theory is not applicable, that is, when the reverberant sound field is not diffuse. Moreover, the diffusion model allows the prediction of the spatial distribution of sound energy within each coupled room, while the statistical theory gives only one sound level for each room.

  2. An order statistics approach to the halo model for galaxies

    NASA Astrophysics Data System (ADS)

    Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.

    2017-04-01

    We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.

  3. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  4. High-temperature behavior of a deformed Fermi gas obeying interpolating statistics.

    PubMed

    Algin, Abdullah; Senay, Mustafa

    2012-04-01

    An outstanding idea originally introduced by Greenberg is to investigate whether there is equivalence between intermediate statistics, which may be different from anyonic statistics, and q-deformed particle algebra. Also, a model to be studied for addressing such an idea could possibly provide us some new consequences about the interactions of particles as well as their internal structures. Motivated mainly by this idea, in this work, we consider a q-deformed Fermi gas model whose statistical properties enable us to effectively study interpolating statistics. Starting with a generalized Fermi-Dirac distribution function, we derive several thermostatistical functions of a gas of these deformed fermions in the thermodynamical limit. We study the high-temperature behavior of the system by analyzing the effects of q deformation on the most important thermostatistical characteristics of the system such as the entropy, specific heat, and equation of state. It is shown that such a deformed fermion model in two and three spatial dimensions exhibits the interpolating statistics in a specific interval of the model deformation parameter 0 < q < 1. In particular, for two and three spatial dimensions, it is found from the behavior of the third virial coefficient of the model that the deformation parameter q interpolates completely between attractive and repulsive systems, including the free boson and fermion cases. From the results obtained in this work, we conclude that such a model could provide much physical insight into some interacting theories of fermions, and could be useful to further study the particle systems with intermediate statistics.

  5. Progress of statistical analysis in biomedical research through the historical review of the development of the Framingham score.

    PubMed

    Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija

    2017-12-02

    The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.

  6. Canonical Statistical Model for Maximum Expected Immission of Wire Conductor in an Aperture Enclosure

    NASA Technical Reports Server (NTRS)

    Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.

    2016-01-01

    Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.

  7. Directional statistics-based reflectance model for isotropic bidirectional reflectance distribution functions.

    PubMed

    Nishino, Ko; Lombardi, Stephen

    2011-01-01

    We introduce a novel parametric bidirectional reflectance distribution function (BRDF) model that can accurately encode a wide variety of real-world isotropic BRDFs with a small number of parameters. The key observation we make is that a BRDF may be viewed as a statistical distribution on a unit hemisphere. We derive a novel directional statistics distribution, which we refer to as the hemispherical exponential power distribution, and model real-world isotropic BRDFs as mixtures of it. We derive a canonical probabilistic method for estimating the parameters, including the number of components, of this novel directional statistics BRDF model. We show that the model captures the full spectrum of real-world isotropic BRDFs with high accuracy, but a small footprint. We also demonstrate the advantages of the novel BRDF model by showing its use for reflection component separation and for exploring the space of isotropic BRDFs.

  8. A Conditional Curie-Weiss Model for Stylized Multi-group Binary Choice with Social Interaction

    NASA Astrophysics Data System (ADS)

    Opoku, Alex Akwasi; Edusei, Kwame Owusu; Ansah, Richard Kwame

    2018-04-01

    This paper proposes a conditional Curie-Weiss model as a model for decision making in a stylized society made up of binary decision makers that face a particular dichotomous choice between two options. Following Brock and Durlauf (Discrete choice with social interaction I: theory, 1955), we set-up both socio-economic and statistical mechanical models for the choice problem. We point out when both the socio-economic and statistical mechanical models give rise to the same self-consistent equilibrium mean choice level(s). Phase diagram of the associated statistical mechanical model and its socio-economic implications are discussed.

  9. Soft Mixer Assignment in a Hierarchical Generative Model of Natural Scene Statistics

    PubMed Central

    Schwartz, Odelia; Sejnowski, Terrence J.; Dayan, Peter

    2010-01-01

    Gaussian scale mixture models offer a top-down description of signal generation that captures key bottom-up statistical characteristics of filter responses to images. However, the pattern of dependence among the filters for this class of models is prespecified. We propose a novel extension to the gaussian scale mixture model that learns the pattern of dependence from observed inputs and thereby induces a hierarchical representation of these inputs. Specifically, we propose that inputs are generated by gaussian variables (modeling local filter structure), multiplied by a mixer variable that is assigned probabilistically to each input from a set of possible mixers. We demonstrate inference of both components of the generative model, for synthesized data and for different classes of natural images, such as a generic ensemble and faces. For natural images, the mixer variable assignments show invariances resembling those of complex cells in visual cortex; the statistics of the gaussian components of the model are in accord with the outputs of divisive normalization models. We also show how our model helps interrelate a wide range of models of image statistics and cortical processing. PMID:16999575

  10. Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity

    NASA Astrophysics Data System (ADS)

    Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.

    As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.

  11. Statistical characteristics of trajectories of diamagnetic unicellular organisms in a magnetic field.

    PubMed

    Gorobets, Yu I; Gorobets, O Yu

    2015-01-01

    The statistical model is proposed in this paper for description of orientation of trajectories of unicellular diamagnetic organisms in a magnetic field. The statistical parameter such as the effective energy is calculated on basis of this model. The resulting effective energy is the statistical characteristics of trajectories of diamagnetic microorganisms in a magnetic field connected with their metabolism. The statistical model is applicable for the case when the energy of the thermal motion of bacteria is negligible in comparison with their energy in a magnetic field and the bacteria manifest the significant "active random movement", i.e. there is the randomizing motion of the bacteria of non thermal nature, for example, movement of bacteria by means of flagellum. The energy of the randomizing active self-motion of bacteria is characterized by the new statistical parameter for biological objects. The parameter replaces the energy of the randomizing thermal motion in calculation of the statistical distribution. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    PubMed

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  13. Probability, statistics, and computational science.

    PubMed

    Beerenwinkel, Niko; Siebourg, Juliane

    2012-01-01

    In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

  14. Peer Review of EPA's Draft BMDS Document: Exponential ...

    EPA Pesticide Factsheets

    BMDS is one of the Agency's premier tools for estimating risk assessments, therefore the validity and reliability of its statistical models are of paramount importance. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling.

  15. Probability of Detection (POD) as a statistical model for the validation of qualitative methods.

    PubMed

    Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T

    2011-01-01

    A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.

  16. Statistical error model for a solar electric propulsion thrust subsystem

    NASA Technical Reports Server (NTRS)

    Bantell, M. H.

    1973-01-01

    The solar electric propulsion thrust subsystem statistical error model was developed as a tool for investigating the effects of thrust subsystem parameter uncertainties on navigation accuracy. The model is currently being used to evaluate the impact of electric engine parameter uncertainties on navigation system performance for a baseline mission to Encke's Comet in the 1980s. The data given represent the next generation in statistical error modeling for low-thrust applications. Principal improvements include the representation of thrust uncertainties and random process modeling in terms of random parametric variations in the thrust vector process for a multi-engine configuration.

  17. Two Paradoxes in Linear Regression Analysis.

    PubMed

    Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

    2016-12-25

    Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

  18. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  19. Statistical Methodologies to Integrate Experimental and Computational Research

    NASA Technical Reports Server (NTRS)

    Parker, P. A.; Johnson, R. T.; Montgomery, D. C.

    2008-01-01

    Development of advanced algorithms for simulating engine flow paths requires the integration of fundamental experiments with the validation of enhanced mathematical models. In this paper, we provide an overview of statistical methods to strategically and efficiently conduct experiments and computational model refinement. Moreover, the integration of experimental and computational research efforts is emphasized. With a statistical engineering perspective, scientific and engineering expertise is combined with statistical sciences to gain deeper insights into experimental phenomenon and code development performance; supporting the overall research objectives. The particular statistical methods discussed are design of experiments, response surface methodology, and uncertainty analysis and planning. Their application is illustrated with a coaxial free jet experiment and a turbulence model refinement investigation. Our goal is to provide an overview, focusing on concepts rather than practice, to demonstrate the benefits of using statistical methods in research and development, thereby encouraging their broader and more systematic application.

  20. Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

    DTIC Science & Technology

    2015-07-15

    Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

  1. Full Counting Statistics for Interacting Fermions with Determinantal Quantum Monte Carlo Simulations.

    PubMed

    Humeniuk, Stephan; Büchler, Hans Peter

    2017-12-08

    We present a method for computing the full probability distribution function of quadratic observables such as particle number or magnetization for the Fermi-Hubbard model within the framework of determinantal quantum Monte Carlo calculations. Especially in cold atom experiments with single-site resolution, such a full counting statistics can be obtained from repeated projective measurements. We demonstrate that the full counting statistics can provide important information on the size of preformed pairs. Furthermore, we compute the full counting statistics of the staggered magnetization in the repulsive Hubbard model at half filling and find excellent agreement with recent experimental results. We show that current experiments are capable of probing the difference between the Hubbard model and the limiting Heisenberg model.

  2. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  3. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. Part 2: Theoretical development of a dynamic model and application to rain fade durations and tolerable control delays for fade countermeasures

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1987-01-01

    A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.

  4. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  5. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  6. Comparing geological and statistical approaches for element selection in sediment tracing research

    NASA Astrophysics Data System (ADS)

    Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon

    2015-04-01

    Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only the Lexys Creek modelling results differing significantly (35%). The statistical model with expanded elemental selection (DFA0.35) differed from the geological model by an average of 30% for all 6 models. Elemental selection for sediment fingerprinting therefore has the potential to impact modeling results. Accordingly is important to incorporate both robust geological and statistical approaches when selecting elements for sediment fingerprinting. For the Baroon Pocket Dam, management should focus on reducing the supply of sediments derived from felsic sources in each of the subcatchments.

  7. Teacher Effects, Value-Added Models, and Accountability

    ERIC Educational Resources Information Center

    Konstantopoulos, Spyros

    2014-01-01

    Background: In the last decade, the effects of teachers on student performance (typically manifested as state-wide standardized tests) have been re-examined using statistical models that are known as value-added models. These statistical models aim to compute the unique contribution of the teachers in promoting student achievement gains from grade…

  8. Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

    ERIC Educational Resources Information Center

    Ferrando, Pere Joan

    2010-01-01

    This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…

  9. Statistical Modeling for Radiation Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, Raymond L.

    2014-01-01

    We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.

  10. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  11. Interpolative modeling of GaAs FET S-parameter data bases for use in Monte Carlo simulations

    NASA Technical Reports Server (NTRS)

    Campbell, L.; Purviance, J.

    1992-01-01

    A statistical interpolation technique is presented for modeling GaAs FET S-parameter measurements for use in the statistical analysis and design of circuits. This is accomplished by interpolating among the measurements in a GaAs FET S-parameter data base in a statistically valid manner.

  12. The Importance of Statistical Modeling in Data Analysis and Inference

    ERIC Educational Resources Information Center

    Rollins, Derrick, Sr.

    2017-01-01

    Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…

  13. Evaluating Item Fit for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Zhang, Bo; Stone, Clement A.

    2008-01-01

    This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…

  14. Educational Statistics and School Improvement. Statistics and the Federal Role in Education.

    ERIC Educational Resources Information Center

    Hawley, Willis D.

    This paper focuses on how educational statistics might better serve the quest for educational improvement in elementary and secondary schools. A model for conceptualizing the sources and processes of school productivity is presented. The Learning Productivity Model suggests that school outcomes are the consequence of the interaction of five…

  15. Teaching Engineering Statistics with Technology, Group Learning, Contextual Projects, Simulation Models and Student Presentations

    ERIC Educational Resources Information Center

    Romeu, Jorge Luis

    2008-01-01

    This article discusses our teaching approach in graduate level Engineering Statistics. It is based on the use of modern technology, learning groups, contextual projects, simulation models, and statistical and simulation software to entice student motivation. The use of technology to facilitate group projects and presentations, and to generate,…

  16. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  17. Statistical modeling of natural backgrounds in hyperspectral LWIR data

    NASA Astrophysics Data System (ADS)

    Truslow, Eric; Manolakis, Dimitris; Cooley, Thomas; Meola, Joseph

    2016-09-01

    Hyperspectral sensors operating in the long wave infrared (LWIR) have a wealth of applications including remote material identification and rare target detection. While statistical models for modeling surface reflectance in visible and near-infrared regimes have been well studied, models for the temperature and emissivity in the LWIR have not been rigorously investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface temperature. Statistical models for the surface parameters can be used to simulate surface radiances and at-sensor radiance which drives the variability of measured radiance and ultimately the performance of signal processing algorithms. Thus, having models that adequately capture data variation is extremely important for studying performance trades. The purpose of this paper is twofold. First, we study the validity of this model using real hyperspectral data, and compare the relative variability of hyperspectral data in the LWIR and visible and near-infrared (VNIR) regimes. Second, we illustrate how materials that are easily distinguished in the VNIR, may be difficult to separate when imaged in the LWIR.

  18. Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

    NASA Astrophysics Data System (ADS)

    Guadagnini, A.; Riva, M.; Dell'Oca, A.

    2017-12-01

    We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.

  19. Asking Sensitive Questions: A Statistical Power Analysis of Randomized Response Models

    ERIC Educational Resources Information Center

    Ulrich, Rolf; Schroter, Hannes; Striegel, Heiko; Simon, Perikles

    2012-01-01

    This article derives the power curves for a Wald test that can be applied to randomized response models when small prevalence rates must be assessed (e.g., detecting doping behavior among elite athletes). These curves enable the assessment of the statistical power that is associated with each model (e.g., Warner's model, crosswise model, unrelated…

  20. WORKSHOP ON APPLICATION OF STATISTICAL METHODS TO BIOLOGICALLY-BASED PHARMACOKINETIC MODELING FOR RISK ASSESSMENT

    EPA Science Inventory

    Biologically-based pharmacokinetic models are being increasingly used in the risk assessment of environmental chemicals. These models are based on biological, mathematical, statistical and engineering principles. Their potential uses in risk assessment include extrapolation betwe...

  1. Counts-in-cylinders in the Sloan Digital Sky Survey with Comparisons to N-body Simulations

    NASA Astrophysics Data System (ADS)

    Berrier, Heather D.; Barton, Elizabeth J.; Berrier, Joel C.; Bullock, James S.; Zentner, Andrew R.; Wechsler, Risa H.

    2011-01-01

    Environmental statistics provide a necessary means of comparing the properties of galaxies in different environments, and a vital test of models of galaxy formation within the prevailing hierarchical cosmological model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a particular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same two-point correlation functions do not necessarily have the same companion count distributions. We use this statistic to examine the environments of galaxies in the Sloan Digital Sky Survey Data Release 4 (SDSS DR4). We also make preliminary comparisons to four models for the spatial distributions of galaxies, based on N-body simulations and data from SDSS DR4, to study the utility of the counts-in-cylinders statistic. There is a very large scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that prevalent empirical models of galaxy clustering, that match observed two- and three-point clustering statistics well, fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3, and 6 h -1 Mpc scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 6 h -1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 h -1 Mpc cylinder than the galaxies in any of the models we use. Simple phenomenological models that map galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments.

  2. Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

    NASA Astrophysics Data System (ADS)

    Kasibhatla, P.

    2004-12-01

    In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.

  3. Scale Dependence of Statistics of Spatially Averaged Rain Rate Seen in TOGA COARE Comparison with Predictions from a Stochastic Model

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, T. L.; Lau, William K. M. (Technical Monitor)

    2002-01-01

    A characteristic feature of rainfall statistics is that they in general depend on the space and time scales over which rain data are averaged. As a part of an earlier effort to determine the sampling error of satellite rain averages, a space-time model of rainfall statistics was developed to describe the statistics of gridded rain observed in GATE. The model allows one to compute the second moment statistics of space- and time-averaged rain rate which can be fitted to satellite or rain gauge data to determine the four model parameters appearing in the precipitation spectrum - an overall strength parameter, a characteristic length separating the long and short wavelength regimes and a characteristic relaxation time for decay of the autocorrelation of the instantaneous local rain rate and a certain 'fractal' power law exponent. For area-averaged instantaneous rain rate, this exponent governs the power law dependence of these statistics on the averaging length scale $L$ predicted by the model in the limit of small $L$. In particular, the variance of rain rate averaged over an $L \\times L$ area exhibits a power law singularity as $L \\rightarrow 0$. In the present work the model is used to investigate how the statistics of area-averaged rain rate over the tropical Western Pacific measured with ship borne radar during TOGA COARE (Tropical Ocean Global Atmosphere Coupled Ocean Atmospheric Response Experiment) and gridded on a 2 km grid depends on the size of the spatial averaging scale. Good agreement is found between the data and predictions from the model over a wide range of averaging length scales.

  4. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  5. Two Paradoxes in Linear Regression Analysis

    PubMed Central

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  6. Customizing national models for a medical center's population to rapidly identify patients at high risk of 30-day all-cause hospital readmission following a heart failure hospitalization.

    PubMed

    Cox, Zachary L; Lai, Pikki; Lewis, Connie M; Lindenfeld, JoAnn; Collins, Sean P; Lenihan, Daniel J

    2018-05-28

    Nationally-derived models predicting 30-day readmissions following heart failure (HF) hospitalizations yield insufficient discrimination for institutional use. Develop a customized readmission risk model from Medicare-employed and institutionally-customized risk factors and compare the performance against national models in a medical center. Medicare patients age ≥ 65 years hospitalized for HF (n = 1,454) were studied in a derivation cohort and in a separate validation cohort (n = 243). All 30-day hospital readmissions were documented. The primary outcome was risk discrimination (c-statistic) compared to national models. A customized model demonstrated improved discrimination (c-statistic 0.72; 95% CI 0.69 - 0.74) compared to national models (c-statistics of 0.60 and 0.61) with a c-statistic of 0.63 in the validation cohort. Compared to national models, a customized model demonstrated superior readmission risk profiling by distinguishing a high-risk (38.3%) from a low-risk (9.4%) quartile. A customized model improved readmission risk discrimination from HF hospitalizations compared to national models. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. Statistical wind analysis for near-space applications

    NASA Astrophysics Data System (ADS)

    Roney, Jason A.

    2007-09-01

    Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.

  8. A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Travis, James E.

    2013-01-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.

  9. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  10. Ultra-low-dose computed tomographic angiography with model-based iterative reconstruction compared with standard-dose imaging after endovascular aneurysm repair: a prospective pilot study.

    PubMed

    Naidu, Sailen G; Kriegshauser, J Scott; Paden, Robert G; He, Miao; Wu, Qing; Hara, Amy K

    2014-12-01

    An ultra-low-dose radiation protocol reconstructed with model-based iterative reconstruction was compared with our standard-dose protocol. This prospective study evaluated 20 men undergoing surveillance-enhanced computed tomography after endovascular aneurysm repair. All patients underwent standard-dose and ultra-low-dose venous phase imaging; images were compared after reconstruction with filtered back projection, adaptive statistical iterative reconstruction, and model-based iterative reconstruction. Objective measures of aortic contrast attenuation and image noise were averaged. Images were subjectively assessed (1 = worst, 5 = best) for diagnostic confidence, image noise, and vessel sharpness. Aneurysm sac diameter and endoleak detection were compared. Quantitative image noise was 26% less with ultra-low-dose model-based iterative reconstruction than with standard-dose adaptive statistical iterative reconstruction and 58% less than with ultra-low-dose adaptive statistical iterative reconstruction. Average subjective noise scores were not different between ultra-low-dose model-based iterative reconstruction and standard-dose adaptive statistical iterative reconstruction (3.8 vs. 4.0, P = .25). Subjective scores for diagnostic confidence were better with standard-dose adaptive statistical iterative reconstruction than with ultra-low-dose model-based iterative reconstruction (4.4 vs. 4.0, P = .002). Vessel sharpness was decreased with ultra-low-dose model-based iterative reconstruction compared with standard-dose adaptive statistical iterative reconstruction (3.3 vs. 4.1, P < .0001). Ultra-low-dose model-based iterative reconstruction and standard-dose adaptive statistical iterative reconstruction aneurysm sac diameters were not significantly different (4.9 vs. 4.9 cm); concordance for the presence of endoleak was 100% (P < .001). Compared with a standard-dose technique, an ultra-low-dose model-based iterative reconstruction protocol provides comparable image quality and diagnostic assessment at a 73% lower radiation dose.

  11. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  12. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  13. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  14. Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 1: Theory

    NASA Astrophysics Data System (ADS)

    Sundberg, R.; Moberg, A.; Hind, A.

    2012-08-01

    A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

  15. Probability distributions of molecular observables computed from Markov models. II. Uncertainties in observables and their time-evolution

    NASA Astrophysics Data System (ADS)

    Chodera, John D.; Noé, Frank

    2010-09-01

    Discrete-state Markov (or master equation) models provide a useful simplified representation for characterizing the long-time statistical evolution of biomolecules in a manner that allows direct comparison with experiments as well as the elucidation of mechanistic pathways for an inherently stochastic process. A vital part of meaningful comparison with experiment is the characterization of the statistical uncertainty in the predicted experimental measurement, which may take the form of an equilibrium measurement of some spectroscopic signal, the time-evolution of this signal following a perturbation, or the observation of some statistic (such as the correlation function) of the equilibrium dynamics of a single molecule. Without meaningful error bars (which arise from both approximation and statistical error), there is no way to determine whether the deviations between model and experiment are statistically meaningful. Previous work has demonstrated that a Bayesian method that enforces microscopic reversibility can be used to characterize the statistical component of correlated uncertainties in state-to-state transition probabilities (and functions thereof) for a model inferred from molecular simulation data. Here, we extend this approach to include the uncertainty in observables that are functions of molecular conformation (such as surrogate spectroscopic signals) characterizing each state, permitting the full statistical uncertainty in computed spectroscopic experiments to be assessed. We test the approach in a simple model system to demonstrate that the computed uncertainties provide a useful indicator of statistical variation, and then apply it to the computation of the fluorescence autocorrelation function measured for a dye-labeled peptide previously studied by both experiment and simulation.

  16. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  17. Statistical validation of normal tissue complication probability models.

    PubMed

    Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis

    2012-09-01

    To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students.

    PubMed

    Saadati, Farzaneh; Ahmad Tarmizi, Rohani; Mohd Ayub, Ahmad Fauzi; Abu Bakar, Kamariah

    2015-01-01

    Because students' ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is 'value added' because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students' problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students.

  19. Statistically accurate low-order models for uncertainty quantification in turbulent dynamical systems.

    PubMed

    Sapsis, Themistoklis P; Majda, Andrew J

    2013-08-20

    A framework for low-order predictive statistical modeling and uncertainty quantification in turbulent dynamical systems is developed here. These reduced-order, modified quasilinear Gaussian (ROMQG) algorithms apply to turbulent dynamical systems in which there is significant linear instability or linear nonnormal dynamics in the unperturbed system and energy-conserving nonlinear interactions that transfer energy from the unstable modes to the stable modes where dissipation occurs, resulting in a statistical steady state; such turbulent dynamical systems are ubiquitous in geophysical and engineering turbulence. The ROMQG method involves constructing a low-order, nonlinear, dynamical system for the mean and covariance statistics in the reduced subspace that has the unperturbed statistics as a stable fixed point and optimally incorporates the indirect effect of non-Gaussian third-order statistics for the unperturbed system in a systematic calibration stage. This calibration procedure is achieved through information involving only the mean and covariance statistics for the unperturbed equilibrium. The performance of the ROMQG algorithm is assessed on two stringent test cases: the 40-mode Lorenz 96 model mimicking midlatitude atmospheric turbulence and two-layer baroclinic models for high-latitude ocean turbulence with over 125,000 degrees of freedom. In the Lorenz 96 model, the ROMQG algorithm with just a single mode captures the transient response to random or deterministic forcing. For the baroclinic ocean turbulence models, the inexpensive ROMQG algorithm with 252 modes, less than 0.2% of the total, captures the nonlinear response of the energy, the heat flux, and even the one-dimensional energy and heat flux spectra.

  20. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  1. A cloud and radiation model-based algorithm for rainfall retrieval from SSM/I multispectral microwave measurements

    NASA Technical Reports Server (NTRS)

    Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.

    1992-01-01

    A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.

  2. Vortex dynamics and Lagrangian statistics in a model for active turbulence.

    PubMed

    James, Martin; Wilczek, Michael

    2018-02-14

    Cellular suspensions such as dense bacterial flows exhibit a turbulence-like phase under certain conditions. We study this phenomenon of "active turbulence" statistically by using numerical tools. Following Wensink et al. (Proc. Natl. Acad. Sci. U.S.A. 109, 14308 (2012)), we model active turbulence by means of a generalized Navier-Stokes equation. Two-point velocity statistics of active turbulence, both in the Eulerian and the Lagrangian frame, is explored. We characterize the scale-dependent features of two-point statistics in this system. Furthermore, we extend this statistical study with measurements of vortex dynamics in this system. Our observations suggest that the large-scale statistics of active turbulence is close to Gaussian with sub-Gaussian tails.

  3. Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

    EPA Science Inventory

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...

  4. Improved analyses using function datasets and statistical modeling

    Treesearch

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  5. The Development of the Children's Services Statistical Neighbour Benchmarking Model. Final Report

    ERIC Educational Resources Information Center

    Benton, Tom; Chamberlain, Tamsin; Wilson, Rebekah; Teeman, David

    2007-01-01

    In April 2006, the Department for Education and Skills (DfES) commissioned the National Foundation for Educational Research (NFER) to conduct an independent external review in order to develop a single "statistical neighbour" model. This single model aimed to combine the key elements of the different models currently available and be…

  6. Investigating Students' Acceptance of a Statistics Learning Platform Using Technology Acceptance Model

    ERIC Educational Resources Information Center

    Song, Yanjie; Kong, Siu-Cheung

    2017-01-01

    The study aims at investigating university students' acceptance of a statistics learning platform to support the learning of statistics in a blended learning context. Three kinds of digital resources, which are simulations, online videos, and online quizzes, were provided on the platform. Premised on the technology acceptance model, we adopted a…

  7. Computational Modeling of Statistical Learning: Effects of Transitional Probability versus Frequency and Links to Word Learning

    ERIC Educational Resources Information Center

    Mirman, Daniel; Estes, Katharine Graf; Magnuson, James S.

    2010-01-01

    Statistical learning mechanisms play an important role in theories of language acquisition and processing. Recurrent neural network models have provided important insights into how these mechanisms might operate. We examined whether such networks capture two key findings in human statistical learning. In Simulation 1, a simple recurrent network…

  8. Statistical power of intervention analyses: simulation and empirical application to treated lumber prices

    Treesearch

    Jeffrey P. Prestemon

    2009-01-01

    Timber product markets are subject to large shocks deriving from natural disturbances and policy shifts. Statistical modeling of shocks is often done to assess their economic importance. In this article, I simulate the statistical power of univariate and bivariate methods of shock detection using time series intervention models. Simulations show that bivariate methods...

  9. A Mediation Model to Explain the Role of Mathematics Skills and Probabilistic Reasoning on Statistics Achievement

    ERIC Educational Resources Information Center

    Primi, Caterina; Donati, Maria Anna; Chiesi, Francesca

    2016-01-01

    Among the wide range of factors related to the acquisition of statistical knowledge, competence in basic mathematics, including basic probability, has received much attention. In this study, a mediation model was estimated to derive the total, direct, and indirect effects of mathematical competence on statistics achievement taking into account…

  10. Factors Influencing the Behavioural Intention to Use Statistical Software: The Perspective of the Slovenian Students of Social Sciences

    ERIC Educational Resources Information Center

    Brezavšcek, Alenka; Šparl, Petra; Žnidaršic, Anja

    2017-01-01

    The aim of the paper is to investigate the main factors influencing the adoption and continuous utilization of statistical software among university social sciences students in Slovenia. Based on the Technology Acceptance Model (TAM), a conceptual model was derived where five external variables were taken into account: statistical software…

  11. Predicting lettuce canopy photosynthesis with statistical and neural network models

    NASA Technical Reports Server (NTRS)

    Frick, J.; Precetti, C.; Mitchell, C. A.

    1998-01-01

    An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).

  12. Statistical label fusion with hierarchical performance models

    PubMed Central

    Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.

    2014-01-01

    Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809

  13. Toward statistical modeling of saccadic eye-movement and visual saliency.

    PubMed

    Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

    2014-11-01

    In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.

  14. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

    PubMed

    Dorazio, Robert M; Hunter, Margaret E

    2015-11-03

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  16. Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

    NASA Technical Reports Server (NTRS)

    Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard

    2013-01-01

    Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.

  17. Implications of the methodological choices for hydrologic portrayals of climate change over the contiguous United States: Statistically downscaled forcing data and hydrologic models

    USGS Publications Warehouse

    Mizukami, Naoki; Clark, Martyn P.; Gutmann, Ethan D.; Mendoza, Pablo A.; Newman, Andrew J.; Nijssen, Bart; Livneh, Ben; Hay, Lauren E.; Arnold, Jeffrey R.; Brekke, Levi D.

    2016-01-01

    Continental-domain assessments of climate change impacts on water resources typically rely on statistically downscaled climate model outputs to force hydrologic models at a finer spatial resolution. This study examines the effects of four statistical downscaling methods [bias-corrected constructed analog (BCCA), bias-corrected spatial disaggregation applied at daily (BCSDd) and monthly scales (BCSDm), and asynchronous regression (AR)] on retrospective hydrologic simulations using three hydrologic models with their default parameters (the Community Land Model, version 4.0; the Variable Infiltration Capacity model, version 4.1.2; and the Precipitation–Runoff Modeling System, version 3.0.4) over the contiguous United States (CONUS). Biases of hydrologic simulations forced by statistically downscaled climate data relative to the simulation with observation-based gridded data are presented. Each statistical downscaling method produces different meteorological portrayals including precipitation amount, wet-day frequency, and the energy input (i.e., shortwave radiation), and their interplay affects estimations of precipitation partitioning between evapotranspiration and runoff, extreme runoff, and hydrologic states (i.e., snow and soil moisture). The analyses show that BCCA underestimates annual precipitation by as much as −250 mm, leading to unreasonable hydrologic portrayals over the CONUS for all models. Although the other three statistical downscaling methods produce a comparable precipitation bias ranging from −10 to 8 mm across the CONUS, BCSDd severely overestimates the wet-day fraction by up to 0.25, leading to different precipitation partitioning compared to the simulations with other downscaled data. Overall, the choice of downscaling method contributes to less spread in runoff estimates (by a factor of 1.5–3) than the choice of hydrologic model with use of the default parameters if BCCA is excluded.

  18. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory.

    PubMed

    Lord, Dominique; Washington, Simon P; Ivan, John N

    2005-01-01

    There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states-perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of "excess" zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to "excess" zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed-and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.

  19. An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox's model.

    PubMed

    Ng'andu, N H

    1997-03-30

    In the analysis of survival data using the Cox proportional hazard (PH) model, it is important to verify that the explanatory variables analysed satisfy the proportional hazard assumption of the model. This paper presents results of a simulation study that compares five test statistics to check the proportional hazard assumption of Cox's model. The test statistics were evaluated under proportional hazards and the following types of departures from the proportional hazard assumption: increasing relative hazards; decreasing relative hazards; crossing hazards; diverging hazards, and non-monotonic hazards. The test statistics compared include those based on partitioning of failure time and those that do not require partitioning of failure time. The simulation results demonstrate that the time-dependent covariate test, the weighted residuals score test and the linear correlation test have equally good power for detection of non-proportionality in the varieties of non-proportional hazards studied. Using illustrative data from the literature, these test statistics performed similarly.

  20. Evaluating Structural Equation Models for Categorical Outcomes: A New Test Statistic and a Practical Challenge of Interpretation.

    PubMed

    Monroe, Scott; Cai, Li

    2015-01-01

    This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.

  1. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model.

    PubMed

    Austin, Peter C

    2018-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.

  2. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model

    PubMed Central

    Austin, Peter C.

    2017-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest. PMID:29321694

  3. Selecting the right statistical model for analysis of insect count data by using information theoretic measures.

    PubMed

    Sileshi, G

    2006-10-01

    Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.

  4. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  5. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  6. The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

    PubMed

    Tendeiro, Jorge N

    2017-01-01

    Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.

  7. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  8. Unified risk analysis of fatigue failure in ductile alloy components during all three stages of fatigue crack evolution process.

    PubMed

    Patankar, Ravindra

    2003-10-01

    Statistical fatigue life of a ductile alloy specimen is traditionally divided into three stages, namely, crack nucleation, small crack growth, and large crack growth. Crack nucleation and small crack growth show a wide variation and hence a big spread on cycles versus crack length graph. Relatively, large crack growth shows a lesser variation. Therefore, different models are fitted to the different stages of the fatigue evolution process, thus treating different stages as different phenomena. With these independent models, it is impossible to predict one phenomenon based on the information available about the other phenomenon. Experimentally, it is easier to carry out crack length measurements of large cracks compared to nucleating cracks and small cracks. Thus, it is easier to collect statistical data for large crack growth compared to the painstaking effort it would take to collect statistical data for crack nucleation and small crack growth. This article presents a fracture mechanics-based stochastic model of fatigue crack growth in ductile alloys that are commonly encountered in mechanical structures and machine components. The model has been validated by Ray (1998) for crack propagation by various statistical fatigue data. Based on the model, this article proposes a technique to predict statistical information of fatigue crack nucleation and small crack growth properties that uses the statistical properties of large crack growth under constant amplitude stress excitation. The statistical properties of large crack growth under constant amplitude stress excitation can be obtained via experiments.

  9. Incorporating GIS and remote sensing for census population disaggregation

    NASA Astrophysics Data System (ADS)

    Wu, Shuo-Sheng'derek'

    Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.

  10. New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

    PubMed

    Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

    2011-09-07

    Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

    PubMed Central

    Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

    2011-01-01

    Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298

  12. ICD-11 and DSM-5 personality trait domains capture categorical personality disorders: Finding a common ground.

    PubMed

    Bach, Bo; Sellbom, Martin; Skjernov, Mathias; Simonsen, Erik

    2018-05-01

    The five personality disorder trait domains in the proposed International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition are comparable in terms of Negative Affectivity, Detachment, Antagonism/Dissociality and Disinhibition. However, the International Classification of Diseases, 11th edition model includes a separate domain of Anankastia, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model includes an additional domain of Psychoticism. This study examined associations of International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domains, simultaneously, with categorical personality disorders. Psychiatric outpatients ( N = 226) were administered the Structured Clinical Interview for DSM-IV Axis II Personality Disorders Interview and the Personality Inventory for DSM-5. International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domain scores were obtained using pertinent scoring algorithms for the Personality Inventory for DSM-5. Associations between categorical personality disorders and trait domains were examined using correlation and multiple regression analyses. Both the International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models showed relevant continuity with categorical personality disorders and captured a substantial amount of their information. As expected, the International Classification of Diseases, 11th edition model was superior in capturing obsessive-compulsive personality disorder, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model was superior in capturing schizotypal personality disorder. These preliminary findings suggest that little information is 'lost' in a transition to trait domain models and potentially adds to narrowing the gap between Diagnostic and Statistical Manual of Mental Disorders, 5th edition and the proposed International Classification of Diseases, 11th edition model. Accordingly, the International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models may be used to delineate one another as well as features of familiar categorical personality disorder types. A preliminary category-to-domain 'cross walk' is provided in the article.

  13. The Effect on the 8th Grade Students' Attitude towards Statistics of Project Based Learning

    ERIC Educational Resources Information Center

    Koparan, Timur; Güven, Bülent

    2014-01-01

    This study investigates the effect of the project based learning approach on 8th grade students' attitude towards statistics. With this aim, an attitude scale towards statistics was developed. Quasi-experimental research model was used in this study. Following this model in the control group the traditional method was applied to teach statistics…

  14. Secondary Statistical Modeling with the National Assessment of Adult Literacy: Implications for the Design of the Background Questionnaire. Working Paper Series.

    ERIC Educational Resources Information Center

    Kaplan, David

    This paper offers recommendations to the National Center for Education Statistics (NCES) on the development of the background questionnaire for the National Assessment of Adult Literacy (NAAL). The recommendations are from the viewpoint of a researcher interested in applying sophisticated statistical models to address important issues in adult…

  15. A Two-Tiered Model for Analyzing Library Web Site Usage Statistics, Part 1: Web Server Logs.

    ERIC Educational Resources Information Center

    Cohen, Laura B.

    2003-01-01

    Proposes a two-tiered model for analyzing web site usage statistics for academic libraries: one tier for library administrators that analyzes measures indicating library use, and a second tier for web site managers that analyzes measures aiding in server maintenance and site design. Discusses the technology of web site usage statistics, and…

  16. Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Nevitt, Jonathan; Hancock, Gregory R.

    2001-01-01

    Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…

  17. Modelling Complexity: Making Sense of Leadership Issues in 14-19 Education

    ERIC Educational Resources Information Center

    Briggs, Ann R. J.

    2008-01-01

    Modelling of statistical data is a well established analytical strategy. Statistical data can be modelled to represent, and thereby predict, the forces acting upon a structure or system. For the rapidly changing systems in the world of education, modelling enables the researcher to understand, to predict and to enable decisions to be based upon…

  18. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

    PubMed

    Schaid, Daniel J

    2010-01-01

    Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.

  19. Moving in Parallel Toward a Modern Modeling Epistemology: Bayes Factors and Frequentist Modeling Methods.

    PubMed

    Rodgers, Joseph Lee

    2016-01-01

    The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.

  20. Estimating regional plant biodiversity with GIS modelling

    Treesearch

    Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad

    1998-01-01

    We analyzed a statewide species database together with a county-level geographic information system to build a model based on well-surveyed areas to estimate species richness in less surveyed counties. The model involved GIS (Arc/Info) and statistics (S-PLUS), including spatial statistics (S+SpatialStats).

  1. Model Error Estimation for the CPTEC Eta Model

    NASA Technical Reports Server (NTRS)

    Tippett, Michael K.; daSilva, Arlindo

    1999-01-01

    Statistical data assimilation systems require the specification of forecast and observation error statistics. Forecast error is due to model imperfections and differences between the initial condition and the actual state of the atmosphere. Practical four-dimensional variational (4D-Var) methods try to fit the forecast state to the observations and assume that the model error is negligible. Here with a number of simplifying assumption, a framework is developed for isolating the model error given the forecast error at two lead-times. Two definitions are proposed for the Talagrand ratio tau, the fraction of the forecast error due to model error rather than initial condition error. Data from the CPTEC Eta Model running operationally over South America are used to calculate forecast error statistics and lower bounds for tau.

  2. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment.

    PubMed

    Berkes, Pietro; Orbán, Gergo; Lengyel, Máté; Fiser, József

    2011-01-07

    The brain maintains internal models of its environment to interpret sensory inputs and to prepare actions. Although behavioral studies have demonstrated that these internal models are optimally adapted to the statistics of the environment, the neural underpinning of this adaptation is unknown. Using a Bayesian model of sensory cortical processing, we related stimulus-evoked and spontaneous neural activities to inferences and prior expectations in an internal model and predicted that they should match if the model is statistically optimal. To test this prediction, we analyzed visual cortical activity of awake ferrets during development. Similarity between spontaneous and evoked activities increased with age and was specific to responses evoked by natural scenes. This demonstrates the progressive adaptation of internal models to the statistics of natural stimuli at the neural level.

  3. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  4. Modelling the effect of structural QSAR parameters on skin penetration using genetic programming

    NASA Astrophysics Data System (ADS)

    Chung, K. K.; Do, D. Q.

    2010-09-01

    In order to model relationships between chemical structures and biological effects in quantitative structure-activity relationship (QSAR) data, an alternative technique of artificial intelligence computing—genetic programming (GP)—was investigated and compared to the traditional method—statistical. GP, with the primary advantage of generating mathematical equations, was employed to model QSAR data and to define the most important molecular descriptions in QSAR data. The models predicted by GP agreed with the statistical results, and the most predictive models of GP were significantly improved when compared to the statistical models using ANOVA. Recently, artificial intelligence techniques have been applied widely to analyse QSAR data. With the capability of generating mathematical equations, GP can be considered as an effective and efficient method for modelling QSAR data.

  5. Estimating urban ground-level PM10 using MODIS 3km AOD product and meteorological parameters from WRF model

    NASA Astrophysics Data System (ADS)

    Ghotbi, Saba; Sotoudeheian, Saeed; Arhami, Mohammad

    2016-09-01

    Satellite remote sensing products of AOD from MODIS along with appropriate meteorological parameters were used to develop statistical models and estimate ground-level PM10. Most of previous studies obtained meteorological data from synoptic weather stations, with rather sparse spatial distribution, and used it along with 10 km AOD product to develop statistical models, applicable for PM variations in regional scale (resolution of ≥10 km). In the current study, meteorological parameters were simulated with 3 km resolution using WRF model and used along with the rather new 3 km AOD product (launched in 2014). The resulting PM statistical models were assessed for a polluted and largely variable urban area, Tehran, Iran. Despite the critical particulate pollution problem, very few PM studies were conducted in this area. The issue of rather poor direct PM-AOD associations existed, due to different factors such as variations in particles optical properties, in addition to bright background issue for satellite data, as the studied area located in the semi-arid areas of Middle East. Statistical approach of linear mixed effect (LME) was used, and three types of statistical models including single variable LME model (using AOD as independent variable) and multiple variables LME model by using meteorological data from two sources, WRF model and synoptic stations, were examined. Meteorological simulations were performed using a multiscale approach and creating an appropriate physic for the studied region, and the results showed rather good agreements with recordings of the synoptic stations. The single variable LME model was able to explain about 61%-73% of daily PM10 variations, reflecting a rather acceptable performance. Statistical models performance improved through using multivariable LME and incorporating meteorological data as auxiliary variables, particularly by using fine resolution outputs from WRF (R2 = 0.73-0.81). In addition, rather fine resolution for PM estimates was mapped for the studied city, and resulting concentration maps were consistent with PM recordings at the existing stations.

  6. Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

    PubMed

    Baqué, Michèle; Amendt, Jens

    2013-01-01

    Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.

  7. Searching for hidden unexpected features in the SnIa data

    NASA Astrophysics Data System (ADS)

    Shafieloo, A.; Perivolaropoulos, L.

    2010-06-01

    It is known that κ2 statistic and likelihood analysis may not be sensitive to the all features of the data. Despite of the fact that by using κ2 statistic we can measure the overall goodness of fit for a model confronted to a data set, some specific features of the data can stay undetectable. For instance, it has been pointed out that there is an unexpected brightness of the SnIa data at z > 1 in the Union compilation. We quantify this statement by constructing a new statistic, called Binned Normalized Difference (BND) statistic, which is applicable directly on the Type Ia Supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. According to this statistic there are 2.2%, 5.3% and 12.6% consistency between the Gold06, Union08 and Constitution09 data and spatially flat ΛCDM model when the real data is compared with many realizations of the simulated monte carlo datasets. The corresponding realization probability in the context of a (w0,w1) = (-1.4,2) model is more than 30% for all mentioned datasets indicating a much better consistency for this model with respect to the BND statistic. The unexpected high z brightness of SnIa can be interpreted either as a trend towards more deceleration at high z than expected in the context of ΛCDM or as a statistical fluctuation or finally as a systematic effect perhaps due to a mild SnIa evolution at high z.

  8. Comparing the Fit of Item Response Theory and Factor Analysis Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto; Cai, Li; Hernandez, Adolfo

    2011-01-01

    Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be…

  9. A stochastic fractional dynamics model of space-time variability of rain

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun K.; Travis, James E.

    2013-09-01

    varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.

  10. The relationship between the C-statistic of a risk-adjustment model and the accuracy of hospital report cards: a Monte Carlo Study.

    PubMed

    Austin, Peter C; Reeves, Mathew J

    2013-03-01

    Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Monte Carlo simulations were used to examine this issue. We examined the influence of 3 factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card.

  11. The relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards: A Monte Carlo study

    PubMed Central

    Austin, Peter C.; Reeves, Mathew J.

    2015-01-01

    Background Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk-adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. Objectives To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Research Design Monte Carlo simulations were used to examine this issue. We examined the influence of three factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk-adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. Results The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. Conclusions The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card. PMID:23295579

  12. A simple rain attenuation model for earth-space radio links operating at 10-35 GHz

    NASA Technical Reports Server (NTRS)

    Stutzman, W. L.; Yon, K. M.

    1986-01-01

    The simple attenuation model has been improved from an earlier version and now includes the effect of wave polarization. The model is for the prediction of rain attenuation statistics on earth-space communication links operating in the 10-35 GHz band. Simple calculations produce attenuation values as a function of average rain rate. These together with rain rate statistics (either measured or predicted) can be used to predict annual rain attenuation statistics. In this paper model predictions are compared to measured data from a data base of 62 experiments performed in the U.S., Europe, and Japan. Comparisons are also made to predictions from other models.

  13. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  14. How Statisticians Speak Risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Redus, K.S.

    2007-07-01

    The foundation of statistics deals with (a) how to measure and collect data and (b) how to identify models using estimates of statistical parameters derived from the data. Risk is a term used by the statistical community and those that employ statistics to express the results of a statistically based study. Statistical risk is represented as a probability that, for example, a statistical model is sufficient to describe a data set; but, risk is also interpreted as a measure of worth of one alternative when compared to another. The common thread of any risk-based problem is the combination of (a)more » the chance an event will occur, with (b) the value of the event. This paper presents an introduction to, and some examples of, statistical risk-based decision making from a quantitative, visual, and linguistic sense. This should help in understanding areas of radioactive waste management that can be suitably expressed using statistical risk and vice-versa. (authors)« less

  15. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  16. Computationally efficient statistical differential equation modeling using homogenization

    USGS Publications Warehouse

    Hooten, Mevin B.; Garlick, Martha J.; Powell, James A.

    2013-01-01

    Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.

  17. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  18. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  19. Structured statistical models of inductive reasoning.

    PubMed

    Kemp, Charles; Tenenbaum, Joshua B

    2009-01-01

    Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.

  20. Ultrasound image filtering using the mutiplicative model

    NASA Astrophysics Data System (ADS)

    Navarrete, Hugo; Frery, Alejandro C.; Sanchez, Fermin; Anto, Joan

    2002-04-01

    Ultrasound images, as a special case of coherent images, are normally corrupted with multiplicative noise i.e. speckle noise. Speckle noise reduction is a difficult task due to its multiplicative nature, but good statistical models of speckle formation are useful to design adaptive speckle reduction filters. In this article a new statistical model, emerging from the Multiplicative Model framework, is presented and compared to previous models (Rayleigh, Rice and K laws). It is shown that the proposed model gives the best performance when modeling the statistics of ultrasound images. Finally, the parameters of the model can be used to quantify the extent of speckle formation; this quantification is applied to adaptive speckle reduction filter design. The effectiveness of the filter is demonstrated on typical in-vivo log-compressed B-scan images obtained by a clinical ultrasound system.

  1. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  2. A question of separation: disentangling tracer bias and gravitational non-linearity with counts-in-cells statistics

    NASA Astrophysics Data System (ADS)

    Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.

    2018-02-01

    Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.

  3. Modeling Cross-Situational Word-Referent Learning: Prior Questions

    ERIC Educational Resources Information Center

    Yu, Chen; Smith, Linda B.

    2012-01-01

    Both adults and young children possess powerful statistical computation capabilities--they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of…

  4. Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

    USGS Publications Warehouse

    Dorazio, Robert; Hunter, Margaret

    2015-01-01

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  5. Dynamic modelling of n-of-1 data: powerful and flexible data analytics applied to individualised studies.

    PubMed

    Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin

    2017-09-01

    N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.

  6. Comparison of Artificial Neural Networks and ARIMA statistical models in simulations of target wind time series

    NASA Astrophysics Data System (ADS)

    Kolokythas, Kostantinos; Vasileios, Salamalikis; Athanassios, Argiriou; Kazantzidis, Andreas

    2015-04-01

    The wind is a result of complex interactions of numerous mechanisms taking place in small or large scales, so, the better knowledge of its behavior is essential in a variety of applications, especially in the field of power production coming from wind turbines. In the literature there is a considerable number of models, either physical or statistical ones, dealing with the problem of simulation and prediction of wind speed. Among others, Artificial Neural Networks (ANNs) are widely used for the purpose of wind forecasting and, in the great majority of cases, outperform other conventional statistical models. In this study, a number of ANNs with different architectures, which have been created and applied in a dataset of wind time series, are compared to Auto Regressive Integrated Moving Average (ARIMA) statistical models. The data consist of mean hourly wind speeds coming from a wind farm on a hilly Greek region and cover a period of one year (2013). The main goal is to evaluate the models ability to simulate successfully the wind speed at a significant point (target). Goodness-of-fit statistics are performed for the comparison of the different methods. In general, the ANN showed the best performance in the estimation of wind speed prevailing over the ARIMA models.

  7. Bureau of Labor Statistics Employment Projections: Detailed Analysis of Selected Occupations and Industries. Report to the Honorable Berkley Bedell, United States House of Representatives.

    ERIC Educational Resources Information Center

    General Accounting Office, Washington, DC.

    To compile its projections of future employment levels, the Bureau of Labor Statistics (BLS) combines the following five interlinked models in a six-step process: a labor force model, an econometric model of the U.S. economy, an industry activity model, an industry labor demand model, and an occupational labor demand model. The BLS was asked to…

  8. Statistical analysis of modeling error in structural dynamic systems

    NASA Technical Reports Server (NTRS)

    Hasselman, T. K.; Chrostowski, J. D.

    1990-01-01

    The paper presents a generic statistical model of the (total) modeling error for conventional space structures in their launch configuration. Modeling error is defined as the difference between analytical prediction and experimental measurement. It is represented by the differences between predicted and measured real eigenvalues and eigenvectors. Comparisons are made between pre-test and post-test models. Total modeling error is then subdivided into measurement error, experimental error and 'pure' modeling error, and comparisons made between measurement error and total modeling error. The generic statistical model presented in this paper is based on the first four global (primary structure) modes of four different structures belonging to the generic category of Conventional Space Structures (specifically excluding large truss-type space structures). As such, it may be used to evaluate the uncertainty of predicted mode shapes and frequencies, sinusoidal response, or the transient response of other structures belonging to the same generic category.

  9. Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

    PubMed

    Monica, Stefania; Ferrari, Gianluigi

    2018-05-17

    Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.

  10. Statistics of acoustic emissions and stress drops during granular shearing using a stick-slip fiber bundle mode

    NASA Astrophysics Data System (ADS)

    Cohen, D.; Michlmayr, G.; Or, D.

    2012-04-01

    Shearing of dense granular materials appears in many engineering and Earth sciences applications. Under a constant strain rate, the shearing stress at steady state oscillates with slow rises followed by rapid drops that are linked to the build up and failure of force chains. Experiments indicate that these drops display exponential statistics. Measurements of acoustic emissions during shearing indicates that the energy liberated by failure of these force chains has power-law statistics. Representing force chains as fibers, we use a stick-slip fiber bundle model to obtain analytical solutions of the statistical distribution of stress drops and failure energy. In the model, fibers stretch, fail, and regain strength during deformation. Fibers have Weibull-distributed threshold strengths with either quenched and annealed disorder. The shape of the distribution for drops and energy obtained from the model are similar to those measured during shearing experiments. This simple model may be useful to identify failure events linked to force chain failures. Future generalizations of the model that include different types of fiber failure may also allow identification of different types of granular failures that have distinct statistical acoustic emission signatures.

  11. Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students

    PubMed Central

    Saadati, Farzaneh; Ahmad Tarmizi, Rohani

    2015-01-01

    Because students’ ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is ‘value added’ because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students’ problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students. PMID:26132553

  12. Statistical Mechanical Foundation for the Two-State Transition in Protein Folding of Small Globular Proteins

    NASA Astrophysics Data System (ADS)

    Iguchi, Kazumoto

    We discuss the statistical mechanical foundation for the two-state transition in the protein folding of small globular proteins. In the standard arguments of protein folding, the statistical search for the ground state is carried out from astronomically many conformations in the configuration space. This leads us to the famous Levinthal's paradox. To resolve the paradox, Gō first postulated that the two-state transition - all-or-none type transition - is very crucial for the protein folding of small globular proteins and used the Gō's lattice model to show the two-state transition nature. Recently, there have been accumulated many experimental results that support the two-state transition for small globular proteins. Stimulated by such recent experiments, Zwanzig has introduced a minimal statistical mechanical model that exhibits the two-state transition. Also, Finkelstein and coworkers have discussed the solution of the paradox by considering the sequential folding of a small globular protein. On the other hand, recently Iguchi have introduced a toy model of protein folding using the Rubik's magic snake model, in which all folded structures are exactly known and mathematically represented in terms of the four types of conformations: cis-, trans-, left and right gauche-configurations between the unit polyhedrons. In this paper, we study the relationship between the Gō's two-state transition, the Zwanzig's statistical mechanics model and the Finkelsteinapos;s sequential folding model by applying them to the Rubik's magic snake models. We show that the foundation of the Gō's two-state transition model relies on the search within the equienergy surface that is labeled by the contact order of the hydrophobic condensation. This idea reproduces the Zwanzig's statistical model as a special case, realizes the Finkelstein's sequential folding model and fits together to understand the nature of the two-state transition of a small globular protein by calculating the physical quantities such as the free energy, the contact order and the specific heat. We point out the similarity between the liquid-gas transition in statistical mechanics and the two-state transition of protein folding. We also study morphology of the Rubik's magic snake models to give a prototype model for understanding the differences between α-helices proteins and β-sheets proteins.

  13. Limited-information goodness-of-fit testing of diagnostic classification item response models.

    PubMed

    Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

    2016-11-01

    Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X 2 and the likelihood ratio statistic G 2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M 2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M 2 was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M 2 , we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M 2 and XLD2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144). © 2016 The British Psychological Society.

  14. Use of a statistical model of the whole femur in a large scale, multi-model study of femoral neck fracture risk.

    PubMed

    Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark

    2009-09-18

    Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.

  15. Directional Statistics for Polarization Observations of Individual Pulses from Radio Pulsars

    NASA Astrophysics Data System (ADS)

    McKinnon, M. M.

    2010-10-01

    Radio polarimetry is a three-dimensional statistical problem. The three-dimensional aspect of the problem arises from the Stokes parameters Q, U, and V, which completely describe the polarization of electromagnetic radiation and conceptually define the orientation of a polarization vector in the Poincaré sphere. The statistical aspect of the problem arises from the random fluctuations in the source-intrinsic polarization and the instrumental noise. A simple model for the polarization of pulsar radio emission has been used to derive the three-dimensional statistics of radio polarimetry. The model is based upon the proposition that the observed polarization is due to the incoherent superposition of two, highly polarized, orthogonal modes. The directional statistics derived from the model follow the Bingham-Mardia and Fisher family of distributions. The model assumptions are supported by the qualitative agreement between the statistics derived from it and those measured with polarization observations of the individual pulses from pulsars. The orthogonal modes are thought to be the natural modes of radio wave propagation in the pulsar magnetosphere. The intensities of the modes become statistically independent when generalized Faraday rotation (GFR) in the magnetosphere causes the difference in their phases to be large. A stochastic version of GFR occurs when fluctuations in the phase difference are also large, and may be responsible for the more complicated polarization patterns observed in pulsar radio emission.

  16. Statistics for X-chromosome associations.

    PubMed

    Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

    2018-06-13

    In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.

  17. Evaluation of high-resolution sea ice models on the basis of statistical and scaling properties of Arctic sea ice drift and deformation

    NASA Astrophysics Data System (ADS)

    Girard, L.; Weiss, J.; Molines, J. M.; Barnier, B.; Bouillon, S.

    2009-08-01

    Sea ice drift and deformation from models are evaluated on the basis of statistical and scaling properties. These properties are derived from two observation data sets: the RADARSAT Geophysical Processor System (RGPS) and buoy trajectories from the International Arctic Buoy Program (IABP). Two simulations obtained with the Louvain-la-Neuve Ice Model (LIM) coupled to a high-resolution ocean model and a simulation obtained with the Los Alamos Sea Ice Model (CICE) were analyzed. Model ice drift compares well with observations in terms of large-scale velocity field and distributions of velocity fluctuations although a significant bias on the mean ice speed is noted. On the other hand, the statistical properties of ice deformation are not well simulated by the models: (1) The distributions of strain rates are incorrect: RGPS distributions of strain rates are power law tailed, i.e., exhibit "wild randomness," whereas models distributions remain in the Gaussian attraction basin, i.e., exhibit "mild randomness." (2) The models are unable to reproduce the spatial and temporal correlations of the deformation fields: In the observations, ice deformation follows spatial and temporal scaling laws that express the heterogeneity and the intermittency of deformation. These relations do not appear in simulated ice deformation. Mean deformation in models is almost scale independent. The statistical properties of ice deformation are a signature of the ice mechanical behavior. The present work therefore suggests that the mechanical framework currently used by models is inappropriate. A different modeling framework based on elastic interactions could improve the representation of the statistical and scaling properties of ice deformation.

  18. A new approach to fracture modelling in reservoirs using deterministic, genetic and statistical models of fracture growth

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rawnsley, K.; Swaby, P.

    1996-08-01

    It is increasingly acknowledged that in order to understand and forecast the behavior of fracture influenced reservoirs we must attempt to reproduce the fracture system geometry and use this as a basis for fluid flow calculation. This article aims to present a recently developed fracture modelling prototype designed specifically for use in hydrocarbon reservoir environments. The prototype {open_quotes}FRAME{close_quotes} (FRActure Modelling Environment) aims to provide a tool which will allow the generation of realistic 3D fracture systems within a reservoir model, constrained to the known geology of the reservoir by both mechanical and statistical considerations, and which can be used asmore » a basis for fluid flow calculation. Two newly developed modelling techniques are used. The first is an interactive tool which allows complex fault surfaces and their associated deformations to be reproduced. The second is a {open_quotes}genetic{close_quotes} model which grows fracture patterns from seeds using conceptual models of fracture development. The user defines the mechanical input and can retrieve all the statistics of the growing fractures to allow comparison to assumed statistical distributions for the reservoir fractures. Input parameters include growth rate, fracture interaction characteristics, orientation maps and density maps. More traditional statistical stochastic fracture models are also incorporated. FRAME is designed to allow the geologist to input hard or soft data including seismically defined surfaces, well fractures, outcrop models, analogue or numerical mechanical models or geological {open_quotes}feeling{close_quotes}. The geologist is not restricted to {open_quotes}a priori{close_quotes} models of fracture patterns that may not correspond to the data.« less

  19. Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers.

    PubMed

    Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad

    2016-04-27

    Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.

  20. Survey of statistical techniques used in validation studies of air pollution prediction models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bornstein, R D; Anderson, S F

    1979-03-01

    Statistical techniques used by meteorologists to validate predictions made by air pollution models are surveyed. Techniques are divided into the following three groups: graphical, tabular, and summary statistics. Some of the practical problems associated with verification are also discussed. Characteristics desired in any validation program are listed and a suggested combination of techniques that possesses many of these characteristics is presented.

  1. Statistical properties of several models of fractional random point processes

    NASA Astrophysics Data System (ADS)

    Bendjaballah, C.

    2011-08-01

    Statistical properties of several models of fractional random point processes have been analyzed from the counting and time interval statistics points of view. Based on the criterion of the reduced variance, it is seen that such processes exhibit nonclassical properties. The conditions for these processes to be treated as conditional Poisson processes are examined. Numerical simulations illustrate part of the theoretical calculations.

  2. The Use of a Context-Based Information Retrieval Technique

    DTIC Science & Technology

    2009-07-01

    provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic

  3. Sub-poissonian photon statistics in the coherent state Jaynes-Cummings model in non-resonance

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-tai; Fan, An-fu

    1992-03-01

    We study a model with a two-level atom (TLA) non-resonance interacting with a single-mode quantized cavity field (QCF). The photon number probability function, the mean photon number and Mandel's fluctuation parameter are calculated. The sub-Poissonian distributions of the photon statistics are obtained in non-resonance interaction. This statistical properties are strongly dependent on the detuning parameters.

  4. Cultural Diversity and Best Practices in the Teaching and Learning of Statistics: A Faculty Perspective from A Historically Black College/University (HBCU)

    ERIC Educational Resources Information Center

    Whaley, Arthur L.

    2017-01-01

    The literature on the teaching and learning of statistics tend not to address issues of cultural diversity. Twenty-nine students enrolled in a statistics course at a historically Black college/university (HBCU) were the focus of this pilot study. Using structural equation modeling (SEM), the study tested models of the effects of writing…

  5. Statistical modelling of software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1991-01-01

    During the six-month period from 1 April 1991 to 30 September 1991 the following research papers in statistical modeling of software reliability appeared: (1) A Nonparametric Software Reliability Growth Model; (2) On the Use and the Performance of Software Reliability Growth Models; (3) Research and Development Issues in Software Reliability Engineering; (4) Special Issues on Software; and (5) Software Reliability and Safety.

  6. Adding a Parameter Increases the Variance of an Estimated Regression Function

    ERIC Educational Resources Information Center

    Withers, Christopher S.; Nadarajah, Saralees

    2011-01-01

    The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…

  7. Modeling Cross-Situational Word–Referent Learning: Prior Questions

    PubMed Central

    Yu, Chen; Smith, Linda B.

    2013-01-01

    Both adults and young children possess powerful statistical computation capabilities—they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of associative learning. This article describes a series of simulation studies and analyses designed to understand the different learning mechanisms posited by the 2 classes of models and their relation to each other. Variants of a hypothesis-testing model and a simple or dumb associative mechanism were examined under different specifications of information selection, computation, and decision. Critically, these 3 components of the models interact in complex ways. The models illustrate a fundamental tradeoff between amount of data input and powerful computations: With the selection of more information, dumb associative models can mimic the powerful learning that is accomplished by hypothesis-testing models with fewer data. However, because of the interactions among the component parts of the models, the associative model can mimic various hypothesis-testing models, producing the same learning patterns but through different internal components. The simulations argue for the importance of a compositional approach to human statistical learning: the experimental decomposition of the processes that contribute to statistical learning in human learners and models with the internal components that can be evaluated independently and together. PMID:22229490

  8. Examination of Solar Cycle Statistical Model and New Prediction of Solar Cycle 23

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Wilson, John W.

    2000-01-01

    Sunspot numbers in the current solar cycle 23 were estimated by using a statistical model with the accumulating cycle sunspot data based on the odd-even behavior of historical sunspot cycles from 1 to 22. Since cycle 23 has progressed and the accurate solar minimum occurrence has been defined, the statistical model is validated by comparing the previous prediction with the new measured sunspot number; the improved sunspot projection in short range of future time is made accordingly. The current cycle is expected to have a moderate level of activity. Errors of this model are shown to be self-correcting as cycle observations become available.

  9. Development of the AFRL Aircrew Perfomance and Protection Data Bank

    DTIC Science & Technology

    2007-12-01

    Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It

  10. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  11. Statistical modelling for recurrent events: an application to sports injuries

    PubMed Central

    Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F

    2014-01-01

    Background Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. Objective This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Methods Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. Results The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Conclusions Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. PMID:22872683

  12. Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

    PubMed

    Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

    2017-01-01

    This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.

  13. Numerical and Qualitative Contrasts of Two Statistical Models ...

    EPA Pesticide Factsheets

    Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-

  14. Further Insight into the Reaction FeO+ + H2 Yields Fe+ + H2O: Temperature Dependent Kinetics, Isotope Effects, and Statistical Modeling (Postprint)

    DTIC Science & Technology

    2014-07-31

    a laminar flow tube via a Venturi inlet, where ∼104 to 105 collisions with a He buffer gas act to thermalize the ions and carry them downstream...ISOTOPE EFFECTS , AND STATISTICAL MODELING (POSTPRINT) Shaun G. Ard, et al. 31 July 2014 Journal Article AIR FORCE RESEARCH LABORATORY Space Vehicles...Kinetics, Isotope Effects , and Statistical Modeling (Postprint) 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 61102F 6

  15. Analysis of the dependence of extreme rainfalls

    NASA Astrophysics Data System (ADS)

    Padoan, Simone; Ancey, Christophe; Parlange, Marc

    2010-05-01

    The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.

  16. Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos

    NASA Astrophysics Data System (ADS)

    Ganz, Melanie; Nielsen, Mads; Brandt, Sami

    We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.

  17. Relevance of the c-statistic when evaluating risk-adjustment models in surgery.

    PubMed

    Merkow, Ryan P; Hall, Bruce L; Cohen, Mark E; Dimick, Justin B; Wang, Edward; Chow, Warren B; Ko, Clifford Y; Bilimoria, Karl Y

    2012-05-01

    The measurement of hospital quality based on outcomes requires risk adjustment. The c-statistic is a popular tool used to judge model performance, but can be limited, particularly when evaluating specific operations in focused populations. Our objectives were to examine the interpretation and relevance of the c-statistic when used in models with increasingly similar case mix and to consider an alternative perspective on model calibration based on a graphical depiction of model fit. From the American College of Surgeons National Surgical Quality Improvement Program (2008-2009), patients were identified who underwent a general surgery procedure, and procedure groups were increasingly restricted: colorectal-all, colorectal-elective cases only, and colorectal-elective cancer cases only. Mortality and serious morbidity outcomes were evaluated using logistic regression-based risk adjustment, and model c-statistics and calibration curves were used to compare model performance. During the study period, 323,427 general, 47,605 colorectal-all, 39,860 colorectal-elective, and 21,680 colorectal cancer patients were studied. Mortality ranged from 1.0% in general surgery to 4.1% in the colorectal-all group, and serious morbidity ranged from 3.9% in general surgery to 12.4% in the colorectal-all procedural group. As case mix was restricted, c-statistics progressively declined from the general to the colorectal cancer surgery cohorts for both mortality and serious morbidity (mortality: 0.949 to 0.866; serious morbidity: 0.861 to 0.668). Calibration was evaluated graphically by examining predicted vs observed number of events over risk deciles. For both mortality and serious morbidity, there was no qualitative difference in calibration identified between the procedure groups. In the present study, we demonstrate how the c-statistic can become less informative and, in certain circumstances, can lead to incorrect model-based conclusions, as case mix is restricted and patients become more homogenous. Although it remains an important tool, caution is advised when the c-statistic is advanced as the sole measure of a model performance. Copyright © 2012 American College of Surgeons. All rights reserved.

  18. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  19. Statistical transmutation in doped quantum dimer models.

    PubMed

    Lamas, C A; Ralko, A; Cabra, D C; Poilblanc, D; Pujol, P

    2012-07-06

    We prove a "statistical transmutation" symmetry of doped quantum dimer models on the square, triangular, and kagome lattices: the energy spectrum is invariant under a simultaneous change of statistics (i.e., bosonic into fermionic or vice versa) of the holes and of the signs of all the dimer resonance loops. This exact transformation enables us to define the duality equivalence between doped quantum dimer Hamiltonians and provides the analytic framework to analyze dynamical statistical transmutations. We investigate numerically the doping of the triangular quantum dimer model with special focus on the topological Z(2) dimer liquid. Doping leads to four (instead of two for the square lattice) inequivalent families of Hamiltonians. Competition between phase separation, superfluidity, supersolidity, and fermionic phases is investigated in the four families.

  20. An analytic technique for statistically modeling random atomic clock errors in estimation

    NASA Technical Reports Server (NTRS)

    Fell, P. J.

    1981-01-01

    Minimum variance estimation requires that the statistics of random observation errors be modeled properly. If measurements are derived through the use of atomic frequency standards, then one source of error affecting the observable is random fluctuation in frequency. This is the case, for example, with range and integrated Doppler measurements from satellites of the Global Positioning and baseline determination for geodynamic applications. An analytic method is presented which approximates the statistics of this random process. The procedure starts with a model of the Allan variance for a particular oscillator and develops the statistics of range and integrated Doppler measurements. A series of five first order Markov processes is used to approximate the power spectral density obtained from the Allan variance.

  1. Modelling unsupervised online-learning of artificial grammars: linking implicit and statistical learning.

    PubMed

    Rohrmeier, Martin A; Cross, Ian

    2014-07-01

    Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Assessment of corneal properties based on statistical modeling of OCT speckle.

    PubMed

    Jesus, Danilo A; Iskander, D Robert

    2017-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.

  3. Metrological traceability in education: A practical online system for measuring and managing middle school mathematics instruction

    NASA Astrophysics Data System (ADS)

    Torres Irribarra, D.; Freund, R.; Fisher, W.; Wilson, M.

    2015-02-01

    Computer-based, online assessments modelled, designed, and evaluated for adaptively administered invariant measurement are uniquely suited to defining and maintaining traceability to standardized units in education. An assessment of this kind is embedded in the Assessing Data Modeling and Statistical Reasoning (ADM) middle school mathematics curriculum. Diagnostic information about middle school students' learning of statistics and modeling is provided via computer-based formative assessments for seven constructs that comprise a learning progression for statistics and modeling from late elementary through the middle school grades. The seven constructs are: Data Display, Meta-Representational Competence, Conceptions of Statistics, Chance, Modeling Variability, Theory of Measurement, and Informal Inference. The end product is a web-delivered system built with Ruby on Rails for use by curriculum development teams working with classroom teachers in designing, developing, and delivering formative assessments. The online accessible system allows teachers to accurately diagnose students' unique comprehension and learning needs in a common language of real-time assessment, logging, analysis, feedback, and reporting.

  4. New insights into the endophenotypic status of cognition in bipolar disorder: genetic modelling study of twins and siblings.

    PubMed

    Georgiades, Anna; Rijsdijk, Fruhling; Kane, Fergus; Rebollo-Mesa, Irene; Kalidindi, Sridevi; Schulze, Katja K; Stahl, Daniel; Walshe, Muriel; Sahakian, Barbara J; McDonald, Colm; Hall, Mei-Hua; Murray, Robin M; Kravariti, Eugenia

    2016-06-01

    Twin studies have lacked statistical power to apply advanced genetic modelling techniques to the search for cognitive endophenotypes for bipolar disorder. To quantify the shared genetic variability between bipolar disorder and cognitive measures. Structural equation modelling was performed on cognitive data collected from 331 twins/siblings of varying genetic relatedness, disease status and concordance for bipolar disorder. Using a parsimonious AE model, verbal episodic and spatial working memory showed statistically significant genetic correlations with bipolar disorder (rg = |0.23|-|0.27|), which lost statistical significance after covarying for affective symptoms. Using an ACE model, IQ and visual-spatial learning showed statistically significant genetic correlations with bipolar disorder (rg = |0.51|-|1.00|), which remained significant after covarying for affective symptoms. Verbal episodic and spatial working memory capture a modest fraction of the bipolar diathesis. IQ and visual-spatial learning may tap into genetic substrates of non-affective symptomatology in bipolar disorder. © The Royal College of Psychiatrists 2016.

  5. Philosophy and the practice of Bayesian statistics

    PubMed Central

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2015-01-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575

  6. Philosophy and the practice of Bayesian statistics.

    PubMed

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2013-02-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.

  7. Modeling Ka-band low elevation angle propagation statistics

    NASA Technical Reports Server (NTRS)

    Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.

    1995-01-01

    The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.

  8. GIA Model Statistics for GRACE Hydrology, Cryosphere, and Ocean Science

    NASA Astrophysics Data System (ADS)

    Caron, L.; Ivins, E. R.; Larour, E.; Adhikari, S.; Nilsson, J.; Blewitt, G.

    2018-03-01

    We provide a new analysis of glacial isostatic adjustment (GIA) with the goal of assembling the model uncertainty statistics required for rigorously extracting trends in surface mass from the Gravity Recovery and Climate Experiment (GRACE) mission. Such statistics are essential for deciphering sea level, ocean mass, and hydrological changes because the latter signals can be relatively small (≤2 mm/yr water height equivalent) over very large regions, such as major ocean basins and watersheds. With abundant new >7 year continuous measurements of vertical land motion (VLM) reported by Global Positioning System stations on bedrock and new relative sea level records, our new statistical evaluation of GIA uncertainties incorporates Bayesian methodologies. A unique aspect of the method is that both the ice history and 1-D Earth structure vary through a total of 128,000 forward models. We find that best fit models poorly capture the statistical inferences needed to correctly invert for lower mantle viscosity and that GIA uncertainty exceeds the uncertainty ascribed to trends from 14 years of GRACE data in polar regions.

  9. Statistics of Dark Matter Halos from Gravitational Lensing.

    PubMed

    Jain; Van Waerbeke L

    2000-02-10

    We present a new approach to measure the mass function of dark matter halos and to discriminate models with differing values of Omega through weak gravitational lensing. We measure the distribution of peaks from simulated lensing surveys and show that the lensing signal due to dark matter halos can be detected for a wide range of peak heights. Even when the signal-to-noise ratio is well below the limit for detection of individual halos, projected halo statistics can be constrained for halo masses spanning galactic to cluster halos. The use of peak statistics relies on an analytical model of the noise due to the intrinsic ellipticities of source galaxies. The noise model has been shown to accurately describe simulated data for a variety of input ellipticity distributions. We show that the measured peak distribution has distinct signatures of gravitational lensing, and its non-Gaussian shape can be used to distinguish models with different values of Omega. The use of peak statistics is complementary to the measurement of field statistics, such as the ellipticity correlation function, and is possibly not susceptible to the same systematic errors.

  10. Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

    PubMed Central

    Snijders, Tom A.B.; Steglich, Christian E.G.

    2014-01-01

    Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578

  11. Probabilistic Modeling and Visualization of the Flexibility in Morphable Models

    NASA Astrophysics Data System (ADS)

    Lüthi, M.; Albrecht, T.; Vetter, T.

    Statistical shape models, and in particular morphable models, have gained widespread use in computer vision, computer graphics and medical imaging. Researchers have started to build models of almost any anatomical structure in the human body. While these models provide a useful prior for many image analysis task, relatively little information about the shape represented by the morphable model is exploited. We propose a method for computing and visualizing the remaining flexibility, when a part of the shape is fixed. Our method, which is based on Probabilistic PCA, not only leads to an approach for reconstructing the full shape from partial information, but also allows us to investigate and visualize the uncertainty of a reconstruction. To show the feasibility of our approach we performed experiments on a statistical model of the human face and the femur bone. The visualization of the remaining flexibility allows for greater insight into the statistical properties of the shape.

  12. Machine Learning Predictions of a Multiresolution Climate Model Ensemble

    NASA Astrophysics Data System (ADS)

    Anderson, Gemma J.; Lucas, Donald D.

    2018-05-01

    Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.

  13. TOWARDS REFINED USE OF TOXICITY DATA IN STATISTICALLY BASED SAR MODELS FOR DEVELOPMENTAL TOXICITY.

    EPA Science Inventory

    In 2003, an International Life Sciences Institute (ILSI) Working Group examined the potential of statistically based structure-activity relationship (SAR) models for use in screening environmental contaminants for possible developmental toxicants.

  14. Seasonal Atmospheric and Oceanic Predictions

    NASA Technical Reports Server (NTRS)

    Roads, John; Rienecker, Michele (Technical Monitor)

    2003-01-01

    Several projects associated with dynamical, statistical, single column, and ocean models are presented. The projects include: 1) Regional Climate Modeling; 2) Statistical Downscaling; 3) Evaluation of SCM and NSIPP AGCM Results at the ARM Program Sites; and 4) Ocean Forecasts.

  15. Standard and reduced radiation dose liver CT images: adaptive statistical iterative reconstruction versus model-based iterative reconstruction-comparison of findings and image quality.

    PubMed

    Shuman, William P; Chan, Keith T; Busey, Janet M; Mitsumori, Lee M; Choi, Eunice; Koprowicz, Kent M; Kanal, Kalpana M

    2014-12-01

    To investigate whether reduced radiation dose liver computed tomography (CT) images reconstructed with model-based iterative reconstruction ( MBIR model-based iterative reconstruction ) might compromise depiction of clinically relevant findings or might have decreased image quality when compared with clinical standard radiation dose CT images reconstructed with adaptive statistical iterative reconstruction ( ASIR adaptive statistical iterative reconstruction ). With institutional review board approval, informed consent, and HIPAA compliance, 50 patients (39 men, 11 women) were prospectively included who underwent liver CT. After a portal venous pass with ASIR adaptive statistical iterative reconstruction images, a 60% reduced radiation dose pass was added with MBIR model-based iterative reconstruction images. One reviewer scored ASIR adaptive statistical iterative reconstruction image quality and marked findings. Two additional independent reviewers noted whether marked findings were present on MBIR model-based iterative reconstruction images and assigned scores for relative conspicuity, spatial resolution, image noise, and image quality. Liver and aorta Hounsfield units and image noise were measured. Volume CT dose index and size-specific dose estimate ( SSDE size-specific dose estimate ) were recorded. Qualitative reviewer scores were summarized. Formal statistical inference for signal-to-noise ratio ( SNR signal-to-noise ratio ), contrast-to-noise ratio ( CNR contrast-to-noise ratio ), volume CT dose index, and SSDE size-specific dose estimate was made (paired t tests), with Bonferroni adjustment. Two independent reviewers identified all 136 ASIR adaptive statistical iterative reconstruction image findings (n = 272) on MBIR model-based iterative reconstruction images, scoring them as equal or better for conspicuity, spatial resolution, and image noise in 94.1% (256 of 272), 96.7% (263 of 272), and 99.3% (270 of 272), respectively. In 50 image sets, two reviewers (n = 100) scored overall image quality as sufficient or good with MBIR model-based iterative reconstruction in 99% (99 of 100). Liver SNR signal-to-noise ratio was significantly greater for MBIR model-based iterative reconstruction (10.8 ± 2.5 [standard deviation] vs 7.7 ± 1.4, P < .001); there was no difference for CNR contrast-to-noise ratio (2.5 ± 1.4 vs 2.4 ± 1.4, P = .45). For ASIR adaptive statistical iterative reconstruction and MBIR model-based iterative reconstruction , respectively, volume CT dose index was 15.2 mGy ± 7.6 versus 6.2 mGy ± 3.6; SSDE size-specific dose estimate was 16.4 mGy ± 6.6 versus 6.7 mGy ± 3.1 (P < .001). Liver CT images reconstructed with MBIR model-based iterative reconstruction may allow up to 59% radiation dose reduction compared with the dose with ASIR adaptive statistical iterative reconstruction , without compromising depiction of findings or image quality. © RSNA, 2014.

  16. An astronomer's guide to period searching

    NASA Astrophysics Data System (ADS)

    Schwarzenberg-Czerny, A.

    2003-03-01

    We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.

  17. Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

    PubMed

    Harrington, Peter de Boves

    2018-01-02

    Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

  18. Local dependence in random graph models: characterization, properties and statistical inference

    PubMed Central

    Schweinberger, Michael; Handcock, Mark S.

    2015-01-01

    Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142

  19. Model-Free CUSUM Methods for Person Fit

    ERIC Educational Resources Information Center

    Armstrong, Ronald D.; Shi, Min

    2009-01-01

    This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

  20. Performance of the S - [chi][squared] Statistic for Full-Information Bifactor Models

    ERIC Educational Resources Information Center

    Li, Ying; Rupp, Andre A.

    2011-01-01

    This study investigated the Type I error rate and power of the multivariate extension of the S - [chi][squared] statistic using unidimensional and multidimensional item response theory (UIRT and MIRT, respectively) models as well as full-information bifactor (FI-bifactor) models through simulation. Manipulated factors included test length, sample…

  1. Asset Attribution Stability and Portfolio Construction: An Educational Example

    ERIC Educational Resources Information Center

    Chong, James T.; Jennings, William P.; Phillips, G. Michael

    2014-01-01

    This paper illustrates how a third statistic from asset pricing models, the R-squared statistic, may have information that can help in portfolio construction. Using a traditional CAPM model in comparison to an 18-factor Arbitrage Pricing Style Model, a portfolio separation test is conducted. Portfolio returns and risk metrics are compared using…

  2. The Co-Emergence of Aggregate and Modelling Reasoning

    ERIC Educational Resources Information Center

    Aridor, Keren; Ben-Zvi, Dani

    2017-01-01

    This article examines how two processes--reasoning with statistical modelling of a real phenomenon and aggregate reasoning--can co-emerge. We focus in this case study on the emergent reasoning of two fifth graders (aged 10) involved in statistical data analysis, informal inference, and modelling activities using TinkerPlots™. We describe nine…

  3. Seed Dispersal Near and Far: Patterns Across Temperate and Tropical Forests

    Treesearch

    James S. Clark; Miles Silman; Ruth Kern; Eric Macklin; Janneke HilleRisLambers

    1999-01-01

    Dispersal affects community dynamics and vegetation response to global change. Understanding these effects requires descriptions of dispersal at local and regional scales and statistical models that permit estimation. Classical models of dispersal describe local or long-distance dispersal, but not both. The lack of statistical methods means that models have rarely been...

  4. Assessment of the scale effect on statistical downscaling quality at a station scale using a weather generator-based model

    USDA-ARS?s Scientific Manuscript database

    The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...

  5. A Modeling Approach to the Development of Students' Informal Inferential Reasoning

    ERIC Educational Resources Information Center

    Doerr, Helen M.; Delmas, Robert; Makar, Katie

    2017-01-01

    Teaching from an informal statistical inference perspective can address the challenge of teaching statistics in a coherent way. We argue that activities that promote model-based reasoning address two additional challenges: providing a coherent sequence of topics and promoting the application of knowledge to novel situations. We take a models and…

  6. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  8. Performance of Reclassification Statistics in Comparing Risk Prediction Models

    PubMed Central

    Paynter, Nina P.

    2012-01-01

    Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information. PMID:21294152

  9. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  10. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  11. Statistical and Hydrological evaluation of precipitation forecasts from IMD MME and ECMWF numerical weather forecasts for Indian River basins

    NASA Astrophysics Data System (ADS)

    Mohite, A. R.; Beria, H.; Behera, A. K.; Chatterjee, C.; Singh, R.

    2016-12-01

    Flood forecasting using hydrological models is an important and cost-effective non-structural flood management measure. For forecasting at short lead times, empirical models using real-time precipitation estimates have proven to be reliable. However, their skill depreciates with increasing lead time. Coupling a hydrologic model with real-time rainfall forecasts issued from numerical weather prediction (NWP) systems could increase the lead time substantially. In this study, we compared 1-5 days precipitation forecasts from India Meteorological Department (IMD) Multi-Model Ensemble (MME) with European Center for Medium Weather forecast (ECMWF) NWP forecasts for over 86 major river basins in India. We then evaluated the hydrologic utility of these forecasts over Basantpur catchment (approx. 59,000 km2) of the Mahanadi River basin. Coupled MIKE 11 RR (NAM) and MIKE 11 hydrodynamic (HD) models were used for the development of flood forecast system (FFS). RR model was calibrated using IMD station rainfall data. Cross-sections extracted from SRTM 30 were used as input to the MIKE 11 HD model. IMD started issuing operational MME forecasts from the year 2008, and hence, both the statistical and hydrologic evaluation were carried out from 2008-2014. The performance of FFS was evaluated using both the NWP datasets separately for the year 2011, which was a large flood year in Mahanadi River basin. We will present figures and metrics for statistical (threshold based statistics, skill in terms of correlation and bias) and hydrologic (Nash Sutcliffe efficiency, mean and peak error statistics) evaluation. The statistical evaluation will be at pan-India scale for all the major river basins and the hydrologic evaluation will be for the Basantpur catchment of the Mahanadi River basin.

  12. The potential of composite cognitive scores for tracking progression in Huntington's disease.

    PubMed

    Jones, Rebecca; Stout, Julie C; Labuschagne, Izelle; Say, Miranda; Justo, Damian; Coleman, Allison; Dumas, Eve M; Hart, Ellen; Owen, Gail; Durr, Alexandra; Leavitt, Blair R; Roos, Raymund; O'Regan, Alison; Langbehn, Doug; Tabrizi, Sarah J; Frost, Chris

    2014-01-01

    Composite scores derived from joint statistical modelling of individual risk factors are widely used to identify individuals who are at increased risk of developing disease or of faster disease progression. We investigated the ability of composite measures developed using statistical models to differentiate progressive cognitive deterioration in Huntington's disease (HD) from natural decline in healthy controls. Using longitudinal data from TRACK-HD, the optimal combinations of quantitative cognitive measures to differentiate premanifest and early stage HD individuals respectively from controls was determined using logistic regression. Composite scores were calculated from the parameters of each statistical model. Linear regression models were used to calculate effect sizes (ES) quantifying the difference in longitudinal change over 24 months between premanifest and early stage HD groups respectively and controls. ES for the composites were compared with ES for individual cognitive outcomes and other measures used in HD research. The 0.632 bootstrap was used to eliminate biases which result from developing and testing models in the same sample. In early HD, the composite score from the HD change prediction model produced an ES for difference in rate of 24-month change relative to controls of 1.14 (95% CI: 0.90 to 1.39), larger than the ES for any individual cognitive outcome and UHDRS Total Motor Score and Total Functional Capacity. In addition, this composite gave a statistically significant difference in rate of change in premanifest HD compared to controls over 24-months (ES: 0.24; 95% CI: 0.04 to 0.44), even though none of the individual cognitive outcomes produced statistically significant ES over this period. Composite scores developed using appropriate statistical modelling techniques have the potential to materially reduce required sample sizes for randomised controlled trials.

  13. 2013 Annual Disability Statistics Compendium

    ERIC Educational Resources Information Center

    Houtenville, Andrew J.

    2013-01-01

    The "Annual Disability Statistics Compendium" is a publication of statistics about people with disabilities and the government programs which serve them. It is modeled after the U.S. Department of Commerce's annual "Statistical Abstracts of the United States." The "Compendium" is designed to serve as a reference guide…

  14. 2015 Annual Disability Statistics Compendium

    ERIC Educational Resources Information Center

    Houtenville, Andrew J.; Brucker, Debra L.; Lauer, Eric A.

    2016-01-01

    The "Annual Disability Statistics Compendium" is a publication of statistics about people with disabilities and about the government programs which serve them. It is modeled after the "Statistical Abstracts of the United States," published yearly by the U.S. Department of Commerce. The "Compendium" is designed to…

  15. 2014 Annual Disability Statistics Compendium

    ERIC Educational Resources Information Center

    Houtenville, Andrew J.; Brucker, Debra L.; Lauer, Eric A.

    2014-01-01

    The "Annual Disability Statistics Compendium" is a publication of statistics about people with disabilities and about the government programs which serve them. It is modeled after the "Statistical Abstracts of the United States," published yearly by the U.S. Department of Commerce. The "Compendium" is designed to…

  16. Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

    PubMed

    LaBudde, Robert A; Harnly, James M

    2012-01-01

    A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.

  17. An Exploration of Student Attitudes and Satisfaction in a GAISE-Influenced Introductory Statistics Course

    ERIC Educational Resources Information Center

    Paul, Warren; Cunnington, R. Clare

    2017-01-01

    We used the Survey of Attitudes Toward Statistics to (1) evaluate using presemester data the Students' Attitudes Toward Statistics Model (SATS-M), and (2) test the effect on attitudes of an introductory statistics course redesigned according to the Guidelines for Assessment and Instruction in Statistics Education (GAISE) by examining the change in…

  18. Measuring Microaggression and Organizational Climate Factors in Military Units

    DTIC Science & Technology

    2011-04-01

    i.e., items) to accurately assess what we intend for them to measure. To assess construct and convergent validity, the author assessed the statistical ...sample indicated both convergent and construct validity of the microaggression scale. Table 5 presents these statistics . Measuring Microaggressions...models. As shown in Table 7, the measurement models had acceptable fit indices. That is, the Chi-square statistics were at their minimum; although the

  19. Comments on statistical issues in numerical modeling for underground nuclear test monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nicholson, W.L.; Anderson, K.K.

    1993-11-01

    The Symposium concluded with prepared summaries by four experts in the involved disciplines. These experts made no mention of statistics and/or the statistical content of issues. The first author contributed an extemporaneous statement at the Symposium because there are important issues associated with conducting and evaluating numerical modeling that are familiar to statisticians and often treated successfully by them. This note expands upon these extemporaneous remarks.

  20. Communication Dynamics of Blog Networks

    NASA Astrophysics Data System (ADS)

    Goldberg, Mark; Kelley, Stephen; Magdon-Ismail, Malik; Mertsalov, Konstantin; Wallace, William (Al)

    We study the communication dynamics of Blog networks, focusing on the Russian section of LiveJournal as a case study. Communication (blogger-to-blogger links) in such online communication networks is very dynamic: over 60% of the links in the network are new from one week to the next, though the set of bloggers remains approximately constant. Two fundamental questions are: (i) what models adequately describe such dynamic communication behavior; and (ii) how does one detect the phase transitions, i.e. the changes that go beyond the standard high-level dynamics? We approach these questions through the notion of stable statistics. We give strong experimental evidence to the fact that, despite the extreme amount of communication dynamics, several aggregate statistics are remarkably stable. We use stable statistics to test our models of communication dynamics postulating that any good model should produce values for these statistics which are both stable and close to the observed ones. Stable statistics can also be used to identify phase transitions, since any change in a normally stable statistic indicates a substantial change in the nature of the communication dynamics. We describe models of the communication dynamics in large social networks based on the principle of locality of communication: a node's communication energy is spent mostly within its own "social area," the locality of the node.

  1. Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

    DOE PAGES

    Blanc, Élodie

    2017-01-26

    This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less

  2. Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blanc, Élodie

    This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less

  3. Multiplicative point process as a model of trading activity

    NASA Astrophysics Data System (ADS)

    Gontis, V.; Kaulakys, B.

    2004-11-01

    Signals consisting of a sequence of pulses show that inherent origin of the 1/ f noise is a Brownian fluctuation of the average interevent time between subsequent pulses of the pulse sequence. In this paper, we generalize the model of interevent time to reproduce a variety of self-affine time series exhibiting power spectral density S( f) scaling as a power of the frequency f. Furthermore, we analyze the relation between the power-law correlations and the origin of the power-law probability distribution of the signal intensity. We introduce a stochastic multiplicative model for the time intervals between point events and analyze the statistical properties of the signal analytically and numerically. Such model system exhibits power-law spectral density S( f)∼1/ fβ for various values of β, including β= {1}/{2}, 1 and {3}/{2}. Explicit expressions for the power spectra in the low-frequency limit and for the distribution density of the interevent time are obtained. The counting statistics of the events is analyzed analytically and numerically, as well. The specific interest of our analysis is related with the financial markets, where long-range correlations of price fluctuations largely depend on the number of transactions. We analyze the spectral density and counting statistics of the number of transactions. The model reproduces spectral properties of the real markets and explains the mechanism of power-law distribution of trading activity. The study provides evidence that the statistical properties of the financial markets are enclosed in the statistics of the time interval between trades. A multiplicative point process serves as a consistent model generating this statistics.

  4. Assessment of the long-lead probabilistic prediction for the Asian summer monsoon precipitation (1983-2011) based on the APCC multimodel system and a statistical model

    NASA Astrophysics Data System (ADS)

    Sohn, Soo-Jin; Min, Young-Mi; Lee, June-Yi; Tam, Chi-Yung; Kang, In-Sik; Wang, Bin; Ahn, Joong-Bae; Yamagata, Toshio

    2012-02-01

    The performance of the probabilistic multimodel prediction (PMMP) system of the APEC Climate Center (APCC) in predicting the Asian summer monsoon (ASM) precipitation at a four-month lead (with February initial condition) was compared with that of a statistical model using hindcast data for 1983-2005 and real-time forecasts for 2006-2011. Particular attention was paid to probabilistic precipitation forecasts for the boreal summer after the mature phase of El Niño and Southern Oscillation (ENSO). Taking into account the fact that coupled models' skill for boreal spring and summer precipitation mainly comes from their ability to capture ENSO teleconnection, we developed the statistical model using linear regression with the preceding winter ENSO condition as the predictor. Our results reveal several advantages and disadvantages in both forecast systems. First, the PMMP appears to have higher skills for both above- and below-normal categories in the six-year real-time forecast period, whereas the cross-validated statistical model has higher skills during the 23-year hindcast period. This implies that the cross-validated statistical skill may be overestimated. Second, the PMMP is the better tool for capturing atypical ENSO (or non-canonical ENSO related) teleconnection, which has affected the ASM precipitation during the early 1990s and in the recent decade. Third, the statistical model is more sensitive to the ENSO phase and has an advantage in predicting the ASM precipitation after the mature phase of La Niña.

  5. A Monte Carlo Simulation Comparing the Statistical Precision of Two High-Stakes Teacher Evaluation Methods: A Value-Added Model and a Composite Measure

    ERIC Educational Resources Information Center

    Spencer, Bryden

    2016-01-01

    Value-added models are a class of growth models used in education to assign responsibility for student growth to teachers or schools. For value-added models to be used fairly, sufficient statistical precision is necessary for accurate teacher classification. Previous research indicated precision below practical limits. An alternative approach has…

  6. Climate Change Implications for Tropical Islands: Interpolating and Interpreting Statistically Downscaled GCM Projections for Management and Planning

    Treesearch

    Azad Henareh Khalyani; William A. Gould; Eric Harmsen; Adam Terando; Maya Quinones; Jaime A. Collazo

    2016-01-01

  7. The Answer Is in the Question: A Guide for Describing and Investigating the Conceptual Foundations and Statistical Properties of Cognitive Psychometric Models

    ERIC Educational Resources Information Center

    Rupp, Andre A.

    2007-01-01

    One of the most revolutionary advances in psychometric research during the last decades has been the systematic development of statistical models that allow for cognitive psychometric research (CPR) to be conducted. Many of the models currently available for such purposes are extensions of basic latent variable models in item response theory…

  8. Current state of the art for statistical modeling of species distributions [Chapter 16

    Treesearch

    Troy M. Hegel; Samuel A. Cushman; Jeffrey Evans; Falk Huettmann

    2010-01-01

    Over the past decade the number of statistical modelling tools available to ecologists to model species' distributions has increased at a rapid pace (e.g. Elith et al. 2006; Austin 2007), as have the number of species distribution models (SDM) published in the literature (e.g. Scott et al. 2002). Ten years ago, basic logistic regression (Hosmer and Lemeshow 2000)...

  9. Preparing systems engineering and computing science students in disciplined methods, quantitative, and advanced statistical techniques to improve process performance

    NASA Astrophysics Data System (ADS)

    McCray, Wilmon Wil L., Jr.

    The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.

  10. Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.

    2014-12-01

    Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.

  11. Discharge destination following lower limb fracture: development of a prediction model to assist with decision making.

    PubMed

    Kimmel, Lara A; Holland, Anne E; Edwards, Elton R; Cameron, Peter A; De Steiger, Richard; Page, Richard S; Gabbe, Belinda

    2012-06-01

    Accurate prediction of the likelihood of discharge to inpatient rehabilitation following lower limb fracture made on admission to hospital may assist patient discharge planning and decrease the burden on the hospital system caused by delays in decision making. To develop a prognostic model for discharge to inpatient rehabilitation. Isolated lower extremity fracture cases (excluding fractured neck of femur), captured by the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR), were extracted for analysis. A training data set was created for model development and validation data set for evaluation. A multivariable logistic regression model was developed based on patient and injury characteristics. Models were assessed using measures of discrimination (C-statistic) and calibration (Hosmer-Lemeshow (H-L) statistic). A total of 1429 patients met the inclusion criteria and were randomly split into training and test data sets. Increasing age, more proximal fracture type, compensation or private fund source for the admission, metropolitan location of residence, not working prior to injury and having a self-reported pre-injury disability were included in the final prediction model. The C-statistic for the model was 0.92 (95% confidence interval (CI) 0.88, 0.95) with an H-L statistic of χ(2)=11.62, p=0.17. For the test data set, the C-statistic was 0.86 (95% CI 0.83, 0.90) with an H-L statistic of χ(2)=37.98, p<0.001. A model to predict discharge to inpatient rehabilitation following lower limb fracture was developed with excellent discrimination although the calibration was reduced in the test data set. This model requires prospective testing but could form an integral part of decision making in regards to discharge disposition to facilitate timely and accurate referral to rehabilitation and optimise resource allocation. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

    NASA Astrophysics Data System (ADS)

    Xu, Y.; Jones, A. D.; Rhoades, A.

    2017-12-01

    Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for by each predictor. By comparing the statistical model derived from true world information and modeled world information, we assess the errors lying in the physics of the numerical models. This work provides a unique insight to assess the performance of numerical climate models, and can be used to guide improvement of precipitation prediction.

  13. Development of a Predictive Corrosion Model Using Locality-Specific Corrosion Indices

    DTIC Science & Technology

    2017-09-12

    6 3.2.1 Statistical data analysis methods ...6 3.2.2 Algorithm development method ...components, and method ) were compiled into an executable program that uses mathematical models of materials degradation, and statistical calcula- tions

  14. Analyzing Dyadic Sequence Data—Research Questions and Implied Statistical Models

    PubMed Central

    Fuchs, Peter; Nussbeck, Fridtjof W.; Meuwly, Nathalie; Bodenmann, Guy

    2017-01-01

    The analysis of observational data is often seen as a key approach to understanding dynamics in romantic relationships but also in dyadic systems in general. Statistical models for the analysis of dyadic observational data are not commonly known or applied. In this contribution, selected approaches to dyadic sequence data will be presented with a focus on models that can be applied when sample sizes are of medium size (N = 100 couples or less). Each of the statistical models is motivated by an underlying potential research question, the most important model results are presented and linked to the research question. The following research questions and models are compared with respect to their applicability using a hands on approach: (I) Is there an association between a particular behavior by one and the reaction by the other partner? (Pearson Correlation); (II) Does the behavior of one member trigger an immediate reaction by the other? (aggregated logit models; multi-level approach; basic Markov model); (III) Is there an underlying dyadic process, which might account for the observed behavior? (hidden Markov model); and (IV) Are there latent groups of dyads, which might account for observing different reaction patterns? (mixture Markov; optimal matching). Finally, recommendations for researchers to choose among the different models, issues of data handling, and advises to apply the statistical models in empirical research properly are given (e.g., in a new r-package “DySeq”). PMID:28443037

  15. Quantifying uncertainty in climate change science through empirical information theory.

    PubMed

    Majda, Andrew J; Gershgorin, Boris

    2010-08-24

    Quantifying the uncertainty for the present climate and the predictions of climate change in the suite of imperfect Atmosphere Ocean Science (AOS) computer models is a central issue in climate change science. Here, a systematic approach to these issues with firm mathematical underpinning is developed through empirical information theory. An information metric to quantify AOS model errors in the climate is proposed here which incorporates both coarse-grained mean model errors as well as covariance ratios in a transformation invariant fashion. The subtle behavior of model errors with this information metric is quantified in an instructive statistically exactly solvable test model with direct relevance to climate change science including the prototype behavior of tracer gases such as CO(2). Formulas for identifying the most sensitive climate change directions using statistics of the present climate or an AOS model approximation are developed here; these formulas just involve finding the eigenvector associated with the largest eigenvalue of a quadratic form computed through suitable unperturbed climate statistics. These climate change concepts are illustrated on a statistically exactly solvable one-dimensional stochastic model with relevance for low frequency variability of the atmosphere. Viable algorithms for implementation of these concepts are discussed throughout the paper.

  16. Towards Direct Simulation of Future Tropical Cyclone Statistics in a High-Resolution Global Atmospheric Model

    DOE PAGES

    Wehner, Michael F.; Bala, G.; Duffy, Phillip; ...

    2010-01-01

    We present a set of high-resolution global atmospheric general circulation model (AGCM) simulations focusing on the model's ability to represent tropical storms and their statistics. We find that the model produces storms of hurricane strength with realistic dynamical features. We also find that tropical storm statistics are reasonable, both globally and in the north Atlantic, when compared to recent observations. The sensitivity of simulated tropical storm statistics to increases in sea surface temperature (SST) is also investigated, revealing that a credible late 21st century SST increase produced increases in simulated tropical storm numbers and intensities in all ocean basins. Whilemore » this paper supports previous high-resolution model and theoretical findings that the frequency of very intense storms will increase in a warmer climate, it differs notably from previous medium and high-resolution model studies that show a global reduction in total tropical storm frequency. However, we are quick to point out that this particular model finding remains speculative due to a lack of radiative forcing changes in our time-slice experiments as well as a focus on the Northern hemisphere tropical storm seasons.« less

  17. Testing the Predictive Power of Coulomb Stress on Aftershock Sequences

    NASA Astrophysics Data System (ADS)

    Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.

    2009-12-01

    Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.

  18. Statistically Modeling I-V Characteristics of CNT-FET with LASSO

    NASA Astrophysics Data System (ADS)

    Ma, Dongsheng; Ye, Zuochang; Wang, Yan

    2017-08-01

    With the advent of internet of things (IOT), the need for studying new material and devices for various applications is increasing. Traditionally we build compact models for transistors on the basis of physics. But physical models are expensive and need a very long time to adjust for non-ideal effects. As the vision for the application of many novel devices is not certain or the manufacture process is not mature, deriving generalized accurate physical models for such devices is very strenuous, whereas statistical modeling is becoming a potential method because of its data oriented property and fast implementation. In this paper, one classical statistical regression method, LASSO, is used to model the I-V characteristics of CNT-FET and a pseudo-PMOS inverter simulation based on the trained model is implemented in Cadence. The normalized relative mean square prediction error of the trained model versus experiment sample data and the simulation results show that the model is acceptable for digital circuit static simulation. And such modeling methodology can extend to general devices.

  19. [Applications of the hospital statistics management system].

    PubMed

    Zhai, Hong; Ren, Yong; Liu, Jing; Li, You-Zhang; Ma, Xiao-Long; Jiao, Tao-Tao

    2008-01-01

    The Hospital Statistics Management System is built on an Office Automation Platform of Shandong provincial hospital system. Its workflow, role and popedom technologies are used to standardize and optimize the management program of statistics in the total quality control of hospital statistics. The system's applications have combined the office automation platform with the statistics management in a hospital and this provides a practical example of a modern hospital statistics management model.

  20. Lung Cancer Risk Prediction Model Incorporating Lung Function: Development and Validation in the UK Biobank Prospective Cohort Study.

    PubMed

    Muller, David C; Johansson, Mattias; Brennan, Paul

    2017-03-10

    Purpose Several lung cancer risk prediction models have been developed, but none to date have assessed the predictive ability of lung function in a population-based cohort. We sought to develop and internally validate a model incorporating lung function using data from the UK Biobank prospective cohort study. Methods This analysis included 502,321 participants without a previous diagnosis of lung cancer, predominantly between 40 and 70 years of age. We used flexible parametric survival models to estimate the 2-year probability of lung cancer, accounting for the competing risk of death. Models included predictors previously shown to be associated with lung cancer risk, including sex, variables related to smoking history and nicotine addiction, medical history, family history of lung cancer, and lung function (forced expiratory volume in 1 second [FEV1]). Results During accumulated follow-up of 1,469,518 person-years, there were 738 lung cancer diagnoses. A model incorporating all predictors had excellent discrimination (concordance (c)-statistic [95% CI] = 0.85 [0.82 to 0.87]). Internal validation suggested that the model will discriminate well when applied to new data (optimism-corrected c-statistic = 0.84). The full model, including FEV1, also had modestly superior discriminatory power than one that was designed solely on the basis of questionnaire variables (c-statistic = 0.84 [0.82 to 0.86]; optimism-corrected c-statistic = 0.83; p FEV1 = 3.4 × 10 -13 ). The full model had better discrimination than standard lung cancer screening eligibility criteria (c-statistic = 0.66 [0.64 to 0.69]). Conclusion A risk prediction model that includes lung function has strong predictive ability, which could improve eligibility criteria for lung cancer screening programs.

  1. Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

    PubMed Central

    Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

    2015-01-01

    Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

  2. Development of a mathematical model for the dissolution of uranium dioxide. II. Statistical model for the dissolution of uranium dioxide tablets in nitric acid

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhukovskii, Yu.M.; Luksha, O.P.; Nenarokomov, E.A.

    1988-03-01

    We have derived a statistical model for the dissolution of uranium dioxide tablets for the 6 to 12 M concentration range and temperatures from 80/sup 0/C to the boiling point. The model differs qualitatively from the dissolution model for ground uranium dioxide. In the indicated range of experimental conditions, the mean-square deviation of the curves for the model from the experimental curves is not greater than 6%.

  3. Fully Bayesian tests of neutrality using genealogical summary statistics.

    PubMed

    Drummond, Alexei J; Suchard, Marc A

    2008-10-31

    Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  4. A meta-analysis and statistical modelling of nitrates in groundwater at the African scale

    NASA Astrophysics Data System (ADS)

    Ouedraogo, Issoufou; Vanclooster, Marnik

    2016-06-01

    Contamination of groundwater with nitrate poses a major health risk to millions of people around Africa. Assessing the space-time distribution of this contamination, as well as understanding the factors that explain this contamination, is important for managing sustainable drinking water at the regional scale. This study aims to assess the variables that contribute to nitrate pollution in groundwater at the African scale by statistical modelling. We compiled a literature database of nitrate concentration in groundwater (around 250 studies) and combined it with digital maps of physical attributes such as soil, geology, climate, hydrogeology, and anthropogenic data for statistical model development. The maximum, medium, and minimum observed nitrate concentrations were analysed. In total, 13 explanatory variables were screened to explain observed nitrate pollution in groundwater. For the mean nitrate concentration, four variables are retained in the statistical explanatory model: (1) depth to groundwater (shallow groundwater, typically < 50 m); (2) recharge rate; (3) aquifer type; and (4) population density. The first three variables represent intrinsic vulnerability of groundwater systems to pollution, while the latter variable is a proxy for anthropogenic pollution pressure. The model explains 65 % of the variation of mean nitrate contamination in groundwater at the African scale. Using the same proxy information, we could develop a statistical model for the maximum nitrate concentrations that explains 42 % of the nitrate variation. For the maximum concentrations, other environmental attributes such as soil type, slope, rainfall, climate class, and region type improve the prediction of maximum nitrate concentrations at the African scale. As to minimal nitrate concentrations, in the absence of normal distribution assumptions of the data set, we do not develop a statistical model for these data. The data-based statistical model presented here represents an important step towards developing tools that will allow us to accurately predict nitrate distribution at the African scale and thus may support groundwater monitoring and water management that aims to protect groundwater systems. Yet they should be further refined and validated when more detailed and harmonized data become available and/or combined with more conceptual descriptions of the fate of nutrients in the hydrosystem.

  5. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    NASA Astrophysics Data System (ADS)

    Zaichik, Leonid I.; Alipchenkov, Vladimir M.

    2009-10-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  6. ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online

    PubMed Central

    Posada, David

    2006-01-01

    ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102

  7. Is There a Critical Distance for Fickian Transport? - a Statistical Approach to Sub-Fickian Transport Modelling in Porous Media

    NASA Astrophysics Data System (ADS)

    Most, S.; Nowak, W.; Bijeljic, B.

    2014-12-01

    Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.

  8. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  9. Tree injury and mortality in fires: developing process-based models

    Treesearch

    Bret W. Butler; Matthew B. Dickinson

    2010-01-01

    Wildland fire managers are often required to predict tree injury and mortality when planning a prescribed burn or when considering wildfire management options; and, currently, statistical models based on post-fire observations are the only tools available for this purpose. Implicit in the derivation of statistical models is the assumption that they are strictly...

  10. Multivariate mixed linear model analysis of longitudinal data: an information-rich statistical technique for analyzing disease resistance data

    USDA-ARS?s Scientific Manuscript database

    The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...

  11. Person-Fit Statistics for Joint Models for Accuracy and Speed

    ERIC Educational Resources Information Center

    Fox, Jean-Paul; Marianti, Sukaesi

    2017-01-01

    Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person-fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person-fit tests…

  12. Value Added Productivity Indicators: A Statistical Comparison of the Pre-Test/Post-Test Model and Gain Model.

    ERIC Educational Resources Information Center

    Weerasinghe, Dash; Orsak, Timothy; Mendro, Robert

    In an age of student accountability, public school systems must find procedures for identifying effective schools, classrooms, and teachers that help students continue to learn academically. As a result, researchers have been modeling schools and classrooms to calculate productivity indicators that will withstand not only statistical review but…

  13. A smoothed residual based goodness-of-fit statistic for nest-survival models

    Treesearch

    Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell

    2008-01-01

    Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...

  14. Comparison of statistical and theoretical habitat models for conservation planning: the benefit of ensemble prediction

    Treesearch

    D. Todd Jones-Farrand; Todd M. Fearer; Wayne E. Thogmartin; Frank R. Thompson; Mark D. Nelson; John M. Tirpak

    2011-01-01

    Selection of a modeling approach is an important step in the conservation planning process, but little guidance is available. We compared two statistical and three theoretical habitat modeling approaches representing those currently being used for avian conservation planning at landscape and regional scales: hierarchical spatial count (HSC), classification and...

  15. Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models

    ERIC Educational Resources Information Center

    Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning

    2012-01-01

    The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…

  16. Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

    PubMed

    Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

    2016-05-01

    Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.

  17. 10 CFR 431.173 - Requirements applicable to all manufacturers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... COMMERCIAL AND INDUSTRIAL EQUIPMENT Provisions for Commercial Heating, Ventilating, Air-Conditioning and... is based on engineering or statistical analysis, computer simulation or modeling, or other analytic... method or methods used; (B) The mathematical model, the engineering or statistical analysis, computer...

  18. Twenty-five years of maximum-entropy principle

    NASA Astrophysics Data System (ADS)

    Kapur, J. N.

    1983-04-01

    The strengths and weaknesses of the maximum entropy principle (MEP) are examined and some challenging problems that remain outstanding at the end of the first quarter century of the principle are discussed. The original formalism of the MEP is presented and its relationship to statistical mechanics is set forth. The use of MEP for characterizing statistical distributions, in statistical inference, nonlinear spectral analysis, transportation models, population density models, models for brand-switching in marketing and vote-switching in elections is discussed. Its application to finance, insurance, image reconstruction, pattern recognition, operations research and engineering, biology and medicine, and nonparametric density estimation is considered.

  19. Modeling of Dissipation Element Statistics in Turbulent Non-Premixed Jet Flames

    NASA Astrophysics Data System (ADS)

    Denker, Dominik; Attili, Antonio; Boschung, Jonas; Hennig, Fabian; Pitsch, Heinz

    2017-11-01

    The dissipation element (DE) analysis is a method for analyzing and compartmentalizing turbulent scalar fields. DEs can be described by two parameters, namely the Euclidean distance l between their extremal points and the scalar difference in the respective points Δϕ . The joint probability density function (jPDF) of these two parameters P(Δϕ , l) is expected to suffice for a statistical reconstruction of the scalar field. In addition, reacting scalars show a strong correlation with these DE parameters in both premixed and non-premixed flames. Normalized DE statistics show a remarkable invariance towards changes in Reynolds numbers. This feature of DE statistics was exploited in a Boltzmann-type evolution equation based model for the probability density function (PDF) of the distance between the extremal points P(l) in isotropic turbulence. Later, this model was extended for the jPDF P(Δϕ , l) and then adapted for the use in free shear flows. The effect of heat release on the scalar scales and DE statistics is investigated and an extended model for non-premixed jet flames is introduced, which accounts for the presence of chemical reactions. This new model is validated against a series of DNS of temporally evolving jet flames. European Research Council Project ``Milestone''.

  20. Modelling 1-minute directional observations of the global irradiance.

    NASA Astrophysics Data System (ADS)

    Thejll, Peter; Pagh Nielsen, Kristian; Andersen, Elsa; Furbo, Simon

    2016-04-01

    Direct and diffuse irradiances from the sky has been collected at 1-minute intervals for about a year from the experimental station at the Technical University of Denmark for the IEA project "Solar Resource Assessment and Forecasting". These data were gathered by pyrheliometers tracking the Sun, as well as with apertured pyranometers gathering 1/8th and 1/16th of the light from the sky in 45 degree azimuthal ranges pointed around the compass. The data are gathered in order to develop detailed models of the potentially available solar energy and its variations at high temporal resolution in order to gain a more detailed understanding of the solar resource. This is important for a better understanding of the sub-grid scale cloud variation that cannot be resolved with climate and weather models. It is also important for optimizing the operation of active solar energy systems such as photovoltaic plants and thermal solar collector arrays, and for passive solar energy and lighting to buildings. We present regression-based modelling of the observed data, and focus, here, on the statistical properties of the model fits. Using models based on the one hand on what is found in the literature and on physical expectations, and on the other hand on purely statistical models, we find solutions that can explain up to 90% of the variance in global radiation. The models leaning on physical insights include terms for the direct solar radiation, a term for the circum-solar radiation, a diffuse term and a term for the horizon brightening/darkening. The purely statistical model is found using data- and formula-validation approaches picking model expressions from a general catalogue of possible formulae. The method allows nesting of expressions, and the results found are dependent on and heavily constrained by the cross-validation carried out on statistically independent testing and training data-sets. Slightly better fits -- in terms of variance explained -- is found using the purely statistical fitting/searching approach. We describe the methods applied, results found, and discuss the different potentials of the physics- and statistics-only based model-searches.

  1. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Model for neural signaling leap statistics

    NASA Astrophysics Data System (ADS)

    Chevrollier, Martine; Oriá, Marcos

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.

  3. RooStatsCms: A tool for analysis modelling, combination and statistical studies

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Schott, G.; Quast, G.

    2010-04-01

    RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.

  4. Animal movement: Statistical models for telemetry data

    USGS Publications Warehouse

    Hooten, Mevin B.; Johnson, Devin S.; McClintock, Brett T.; Morales, Juan M.

    2017-01-01

    The study of animal movement has always been a key element in ecological science, because it is inherently linked to critical processes that scale from individuals to populations and communities to ecosystems. Rapid improvements in biotelemetry data collection and processing technology have given rise to a variety of statistical methods for characterizing animal movement. The book serves as a comprehensive reference for the types of statistical models used to study individual-based animal movement. 

  5. Development of Composite Materials with High Passive Damping Properties

    DTIC Science & Technology

    2006-05-15

    frequency response function analysis. Sound transmission through sandwich panels was studied using the statistical energy analysis (SEA). Modal density...2.2.3 Finite element models 14 2.2.4 Statistical energy analysis method 15 CHAPTER 3 ANALYSIS OF DAMPING IN SANDWICH MATERIALS. 24 3.1 Equation of...sheets and the core. 2.2.4 Statistical energy analysis method Finite element models are generally only efficient for problems at low and middle frequencies

  6. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics

    PubMed Central

    2011-01-01

    Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747

  7. Web 2.0 Articles: Content Analysis and a Statistical Model to Predict Recognition of the Need for New Instructional Design Strategies

    ERIC Educational Resources Information Center

    Liu, Leping; Maddux, Cleborne D.

    2008-01-01

    This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…

  8. Infinitely divisible cascades to model the statistics of natural images.

    PubMed

    Chainais, Pierre

    2007-12-01

    We propose to model the statistics of natural images thanks to the large class of stochastic processes called Infinitely Divisible Cascades (IDC). IDC were first introduced in one dimension to provide multifractal time series to model the so-called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar infinitely divisible cascades from 1 to N dimensions and commented on the relevance of such a model in fully developed turbulence in [1]. In this article, we focus on the particular 2 dimensional case. IDC appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDC for applications to procedural texture synthesis.

  9. A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

    PubMed

    John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

    2018-06-01

    Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.

  10. Ionospheric scintillation studies

    NASA Technical Reports Server (NTRS)

    Rino, C. L.; Freemouw, E. J.

    1973-01-01

    The diffracted field of a monochromatic plane wave was characterized by two complex correlation functions. For a Gaussian complex field, these quantities suffice to completely define the statistics of the field. Thus, one can in principle calculate the statistics of any measurable quantity in terms of the model parameters. The best data fits were achieved for intensity statistics derived under the Gaussian statistics hypothesis. The signal structure that achieved the best fit was nearly invariant with scintillation level and irregularity source (ionosphere or solar wind). It was characterized by the fact that more than 80% of the scattered signal power is in phase quadrature with the undeviated or coherent signal component. Thus, the Gaussian-statistics hypothesis is both convenient and accurate for channel modeling work.

  11. Interactions and triggering in a 3D rate and state asperity model

    NASA Astrophysics Data System (ADS)

    Dublanchet, P.; Bernard, P.

    2012-12-01

    Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.

  12. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670

  13. Functional Status Outperforms Comorbidities as a Predictor of 30-Day Acute Care Readmissions in the Inpatient Rehabilitation Population.

    PubMed

    Shih, Shirley L; Zafonte, Ross; Bates, David W; Gerrard, Paul; Goldstein, Richard; Mix, Jacqueline; Niewczyk, Paulette; Greysen, S Ryan; Kazis, Lewis; Ryan, Colleen M; Schneider, Jeffrey C

    2016-10-01

    Functional status is associated with patient outcomes, but is rarely included in hospital readmission risk models. The objective of this study was to determine whether functional status is a better predictor of 30-day acute care readmission than traditionally investigated variables including demographics and comorbidities. Retrospective database analysis between 2002 and 2011. 1158 US inpatient rehabilitation facilities. 4,199,002 inpatient rehabilitation facility admissions comprising patients from 16 impairment groups within the Uniform Data System for Medical Rehabilitation database. Logistic regression models predicting 30-day readmission were developed based on age, gender, comorbidities (Elixhauser comorbidity index, Deyo-Charlson comorbidity index, and Medicare comorbidity tier system), and functional status [Functional Independence Measure (FIM)]. We hypothesized that (1) function-based models would outperform demographic- and comorbidity-based models and (2) the addition of demographic and comorbidity data would not significantly enhance function-based models. For each impairment group, Function Only Models were compared against Demographic-Comorbidity Models and Function Plus Models (Function-Demographic-Comorbidity Models). The primary outcome was 30-day readmission, and the primary measure of model performance was the c-statistic. All-cause 30-day readmission rate from inpatient rehabilitation facilities to acute care hospitals was 9.87%. C-statistics for the Function Only Models were 0.64 to 0.70. For all 16 impairment groups, the Function Only Model demonstrated better c-statistics than the Demographic-Comorbidity Models (c-statistic difference: 0.03-0.12). The best-performing Function Plus Models exhibited negligible improvements in model performance compared to Function Only Models, with c-statistic improvements of only 0.01 to 0.05. Readmissions are currently used as a marker of hospital performance, with recent financial penalties to hospitals for excessive readmissions. Function-based readmission models outperform models based only on demographics and comorbidities. Readmission risk models would benefit from the inclusion of functional status as a primary predictor. Copyright © 2016 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.

  14. The Problem of Auto-Correlation in Parasitology

    PubMed Central

    Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick

    2012-01-01

    Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865

  15. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

  16. Localized Smart-Interpretation

    NASA Astrophysics Data System (ADS)

    Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom

    2014-05-01

    The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.

  17. Illustrating the practice of statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hamada, Christina A; Hamada, Michael S

    2009-01-01

    The practice of statistics involves analyzing data and planning data collection schemes to answer scientific questions. Issues often arise with the data that must be dealt with and can lead to new procedures. In analyzing data, these issues can sometimes be addressed through the statistical models that are developed. Simulation can also be helpful in evaluating a new procedure. Moreover, simulation coupled with optimization can be used to plan a data collection scheme. The practice of statistics as just described is much more than just using a statistical package. In analyzing the data, it involves understanding the scientific problem andmore » incorporating the scientist's knowledge. In modeling the data, it involves understanding how the data were collected and accounting for limitations of the data where possible. Moreover, the modeling is likely to be iterative by considering a series of models and evaluating the fit of these models. Designing a data collection scheme involves understanding the scientist's goal and staying within hislher budget in terms of time and the available resources. Consequently, a practicing statistician is faced with such tasks and requires skills and tools to do them quickly. We have written this article for students to provide a glimpse of the practice of statistics. To illustrate the practice of statistics, we consider a problem motivated by some precipitation data that our relative, Masaru Hamada, collected some years ago. We describe his rain gauge observational study in Section 2. We describe modeling and an initial analysis of the precipitation data in Section 3. In Section 4, we consider alternative analyses that address potential issues with the precipitation data. In Section 5, we consider the impact of incorporating additional infonnation. We design a data collection scheme to illustrate the use of simulation and optimization in Section 6. We conclude this article in Section 7 with a discussion.« less

  18. Subject-enabled analytics model on measurement statistics in health risk expert system for public health informatics.

    PubMed

    Chung, Chi-Jung; Kuo, Yu-Chen; Hsieh, Yun-Yu; Li, Tsai-Chung; Lin, Cheng-Chieh; Liang, Wen-Miin; Liao, Li-Na; Li, Chia-Ing; Lin, Hsueh-Chun

    2017-11-01

    This study applied open source technology to establish a subject-enabled analytics model that can enhance measurement statistics of case studies with the public health data in cloud computing. The infrastructure of the proposed model comprises three domains: 1) the health measurement data warehouse (HMDW) for the case study repository, 2) the self-developed modules of online health risk information statistics (HRIStat) for cloud computing, and 3) the prototype of a Web-based process automation system in statistics (PASIS) for the health risk assessment of case studies with subject-enabled evaluation. The system design employed freeware including Java applications, MySQL, and R packages to drive a health risk expert system (HRES). In the design, the HRIStat modules enforce the typical analytics methods for biomedical statistics, and the PASIS interfaces enable process automation of the HRES for cloud computing. The Web-based model supports both modes, step-by-step analysis and auto-computing process, respectively for preliminary evaluation and real time computation. The proposed model was evaluated by computing prior researches in relation to the epidemiological measurement of diseases that were caused by either heavy metal exposures in the environment or clinical complications in hospital. The simulation validity was approved by the commercial statistics software. The model was installed in a stand-alone computer and in a cloud-server workstation to verify computing performance for a data amount of more than 230K sets. Both setups reached efficiency of about 10 5 sets per second. The Web-based PASIS interface can be used for cloud computing, and the HRIStat module can be flexibly expanded with advanced subjects for measurement statistics. The analytics procedure of the HRES prototype is capable of providing assessment criteria prior to estimating the potential risk to public health. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2017-01-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  20. Joint inversion of marine seismic AVA and CSEM data using statistical rock-physics models and Markov random fields: Stochastic inversion of AVA and CSEM data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, J.; Hoversten, G.M.

    2011-09-15

    Joint inversion of seismic AVA and CSEM data requires rock-physics relationships to link seismic attributes to electrical properties. Ideally, we can connect them through reservoir parameters (e.g., porosity and water saturation) by developing physical-based models, such as Gassmann’s equations and Archie’s law, using nearby borehole logs. This could be difficult in the exploration stage because information available is typically insufficient for choosing suitable rock-physics models and for subsequently obtaining reliable estimates of the associated parameters. The use of improper rock-physics models and the inaccuracy of the estimates of model parameters may cause misleading inversion results. Conversely, it is easy tomore » derive statistical relationships among seismic and electrical attributes and reservoir parameters from distant borehole logs. In this study, we develop a Bayesian model to jointly invert seismic AVA and CSEM data for reservoir parameter estimation using statistical rock-physics models; the spatial dependence of geophysical and reservoir parameters are carried out by lithotypes through Markov random fields. We apply the developed model to a synthetic case, which simulates a CO{sub 2} monitoring application. We derive statistical rock-physics relations from borehole logs at one location and estimate seismic P- and S-wave velocity ratio, acoustic impedance, density, electrical resistivity, lithotypes, porosity, and water saturation at three different locations by conditioning to seismic AVA and CSEM data. Comparison of the inversion results with their corresponding true values shows that the correlation-based statistical rock-physics models provide significant information for improving the joint inversion results.« less

  1. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Hurricane Weather Research and Forecast System ANALYSIS FORECAST MODEL GSI Gridpoint Statistical Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author

  2. Heterogeneous variances in multi-environment yield trials for corn hybrids

    USDA-ARS?s Scientific Manuscript database

    Recent developments in statistics and computing have enabled much greater levels of complexity in statistical models of multi-environment yield trial data. One particular feature of interest to breeders is simultaneously modeling heterogeneity of variances among environments and cultivars. Our obj...

  3. Use of Selected Goodness-of-Fit Statistics to Assess the Accuracy of a Model of Henry Hagg Lake, Oregon

    NASA Astrophysics Data System (ADS)

    Rounds, S. A.; Sullivan, A. B.

    2004-12-01

    Assessing a model's ability to reproduce field data is a critical step in the modeling process. For any model, some method of determining goodness-of-fit to measured data is needed to aid in calibration and to evaluate model performance. Visualizations and graphical comparisons of model output are an excellent way to begin that assessment. At some point, however, model performance must be quantified. Goodness-of-fit statistics, including the mean error (ME), mean absolute error (MAE), root mean square error, and coefficient of determination, typically are used to measure model accuracy. Statistical tools such as the sign test or Wilcoxon test can be used to test for model bias. The runs test can detect phase errors in simulated time series. Each statistic is useful, but each has its limitations. None provides a complete quantification of model accuracy. In this study, a suite of goodness-of-fit statistics was applied to a model of Henry Hagg Lake in northwest Oregon. Hagg Lake is a man-made reservoir on Scoggins Creek, a tributary to the Tualatin River. Located on the west side of the Portland metropolitan area, the Tualatin Basin is home to more than 450,000 people. Stored water in Hagg Lake helps to meet the agricultural and municipal water needs of that population. Future water demands have caused water managers to plan for a potential expansion of Hagg Lake, doubling its storage to roughly 115,000 acre-feet. A model of the lake was constructed to evaluate the lake's water quality and estimate how that quality might change after raising the dam. The laterally averaged, two-dimensional, U.S. Army Corps of Engineers model CE-QUAL-W2 was used to construct the Hagg Lake model. Calibrated for the years 2000 and 2001 and confirmed with data from 2002 and 2003, modeled parameters included water temperature, ammonia, nitrate, phosphorus, algae, zooplankton, and dissolved oxygen. Several goodness-of-fit statistics were used to quantify model accuracy and bias. Model performance was judged to be excellent for water temperature (annual ME: -0.22 to 0.05 ° C; annual MAE: 0.62 to 0.68 ° C) and dissolved oxygen (annual ME: -0.28 to 0.18 mg/L; annual MAE: 0.43 to 0.92 mg/L), showing that the model is sufficiently accurate for future water resources planning and management.

  4. Two statistics for evaluating parameter identifiability and error reduction

    USGS Publications Warehouse

    Doherty, John; Hunt, Randall J.

    2009-01-01

    Two statistics are presented that can be used to rank input parameters utilized by a model in terms of their relative identifiability based on a given or possible future calibration dataset. Identifiability is defined here as the capability of model calibration to constrain parameters used by a model. Both statistics require that the sensitivity of each model parameter be calculated for each model output for which there are actual or presumed field measurements. Singular value decomposition (SVD) of the weighted sensitivity matrix is then undertaken to quantify the relation between the parameters and observations that, in turn, allows selection of calibration solution and null spaces spanned by unit orthogonal vectors. The first statistic presented, "parameter identifiability", is quantitatively defined as the direction cosine between a parameter and its projection onto the calibration solution space. This varies between zero and one, with zero indicating complete non-identifiability and one indicating complete identifiability. The second statistic, "relative error reduction", indicates the extent to which the calibration process reduces error in estimation of a parameter from its pre-calibration level where its value must be assigned purely on the basis of prior expert knowledge. This is more sophisticated than identifiability, in that it takes greater account of the noise associated with the calibration dataset. Like identifiability, it has a maximum value of one (which can only be achieved if there is no measurement noise). Conceptually it can fall to zero; and even below zero if a calibration problem is poorly posed. An example, based on a coupled groundwater/surface-water model, is included that demonstrates the utility of the statistics. ?? 2009 Elsevier B.V.

  5. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  6. Damages detection in cylindrical metallic specimens by means of statistical baseline models and updated daily temperature profiles

    NASA Astrophysics Data System (ADS)

    Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo

    2017-05-01

    In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.

  7. Assessment of corneal properties based on statistical modeling of OCT speckle

    PubMed Central

    Jesus, Danilo A.; Iskander, D. Robert

    2016-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409

  8. Forecasting runout of rock and debris avalanches

    USGS Publications Warehouse

    Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.

    2006-01-01

    Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.

  9. Statistical Model of Dynamic Markers of the Alzheimer's Pathological Cascade.

    PubMed

    Balsis, Steve; Geraci, Lisa; Benge, Jared; Lowe, Deborah A; Choudhury, Tabina K; Tirso, Robert; Doody, Rachelle S

    2018-05-05

    Alzheimer's disease (AD) is a progressive disease reflected in markers across assessment modalities, including neuroimaging, cognitive testing, and evaluation of adaptive function. Identifying a single continuum of decline across assessment modalities in a single sample is statistically challenging because of the multivariate nature of the data. To address this challenge, we implemented advanced statistical analyses designed specifically to model complex data across a single continuum. We analyzed data from the Alzheimer's Disease Neuroimaging Initiative (ADNI; N = 1,056), focusing on indicators from the assessments of magnetic resonance imaging (MRI) volume, fluorodeoxyglucose positron emission tomography (FDG-PET) metabolic activity, cognitive performance, and adaptive function. Item response theory was used to identify the continuum of decline. Then, through a process of statistical scaling, indicators across all modalities were linked to that continuum and analyzed. Findings revealed that measures of MRI volume, FDG-PET metabolic activity, and adaptive function added measurement precision beyond that provided by cognitive measures, particularly in the relatively mild range of disease severity. More specifically, MRI volume, and FDG-PET metabolic activity become compromised in the very mild range of severity, followed by cognitive performance and finally adaptive function. Our statistically derived models of the AD pathological cascade are consistent with existing theoretical models.

  10. Ecological statistics of Gestalt laws for the perceptual organization of contours.

    PubMed

    Elder, James H; Goldberg, Richard M

    2002-01-01

    Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.

  11. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  12. TRACX2: a connectionist autoencoder using graded chunks to model infant visual statistical learning.

    PubMed

    Mareschal, Denis; French, Robert M

    2017-01-05

    Even newborn infants are able to extract structure from a stream of sensory inputs; yet how this is achieved remains largely a mystery. We present a connectionist autoencoder model, TRACX2, that learns to extract sequence structure by gradually constructing chunks, storing these chunks in a distributed manner across its synaptic weights and recognizing these chunks when they re-occur in the input stream. Chunks are graded rather than all-or-nothing in nature. As chunks are learnt their component parts become more and more tightly bound together. TRACX2 successfully models the data from five experiments from the infant visual statistical learning literature, including tasks involving forward and backward transitional probabilities, low-salience embedded chunk items, part-sequences and illusory items. The model also captures performance differences across ages through the tuning of a single-learning rate parameter. These results suggest that infant statistical learning is underpinned by the same domain-general learning mechanism that operates in auditory statistical learning and, potentially, in adult artificial grammar learning.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).

  13. TRACX2: a connectionist autoencoder using graded chunks to model infant visual statistical learning

    PubMed Central

    French, Robert M.

    2017-01-01

    Even newborn infants are able to extract structure from a stream of sensory inputs; yet how this is achieved remains largely a mystery. We present a connectionist autoencoder model, TRACX2, that learns to extract sequence structure by gradually constructing chunks, storing these chunks in a distributed manner across its synaptic weights and recognizing these chunks when they re-occur in the input stream. Chunks are graded rather than all-or-nothing in nature. As chunks are learnt their component parts become more and more tightly bound together. TRACX2 successfully models the data from five experiments from the infant visual statistical learning literature, including tasks involving forward and backward transitional probabilities, low-salience embedded chunk items, part-sequences and illusory items. The model also captures performance differences across ages through the tuning of a single-learning rate parameter. These results suggest that infant statistical learning is underpinned by the same domain-general learning mechanism that operates in auditory statistical learning and, potentially, in adult artificial grammar learning. This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences’. PMID:27872375

  14. Statistical Comparison of Spike Responses to Natural Stimuli in Monkey Area V1 With Simulated Responses of a Detailed Laminar Network Model for a Patch of V1

    PubMed Central

    Schuch, Klaus; Logothetis, Nikos K.; Maass, Wolfgang

    2011-01-01

    A major goal of computational neuroscience is the creation of computer models for cortical areas whose response to sensory stimuli resembles that of cortical areas in vivo in important aspects. It is seldom considered whether the simulated spiking activity is realistic (in a statistical sense) in response to natural stimuli. Because certain statistical properties of spike responses were suggested to facilitate computations in the cortex, acquiring a realistic firing regimen in cortical network models might be a prerequisite for analyzing their computational functions. We present a characterization and comparison of the statistical response properties of the primary visual cortex (V1) in vivo and in silico in response to natural stimuli. We recorded from multiple electrodes in area V1 of 4 macaque monkeys and developed a large state-of-the-art network model for a 5 × 5-mm patch of V1 composed of 35,000 neurons and 3.9 million synapses that integrates previously published anatomical and physiological details. By quantitative comparison of the model response to the “statistical fingerprint” of responses in vivo, we find that our model for a patch of V1 responds to the same movie in a way which matches the statistical structure of the recorded data surprisingly well. The deviation between the firing regimen of the model and the in vivo data are on the same level as deviations among monkeys and sessions. This suggests that, despite strong simplifications and abstractions of cortical network models, they are nevertheless capable of generating realistic spiking activity. To reach a realistic firing state, it was not only necessary to include both N-methyl-d-aspartate and GABAB synaptic conductances in our model, but also to markedly increase the strength of excitatory synapses onto inhibitory neurons (>2-fold) in comparison to literature values, hinting at the importance to carefully adjust the effect of inhibition for achieving realistic dynamics in current network models. PMID:21106898

  15. Pre-Service Mathematics Teachers' Use of Probability Models in Making Informal Inferences about a Chance Game

    ERIC Educational Resources Information Center

    Kazak, Sibel; Pratt, Dave

    2017-01-01

    This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…

  16. A new test statistic for climate models that includes field and spatial dependencies using Gaussian Markov random fields

    DOE PAGES

    Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel

    2016-07-20

    A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less

  17. Statistical aspects of modeling the labor curve.

    PubMed

    Zhang, Jun; Troendle, James; Grantz, Katherine L; Reddy, Uma M

    2015-06-01

    In a recent review by Cohen and Friedman, several statistical questions on modeling labor curves were raised. This article illustrates that asking data to fit a preconceived model or letting a sufficiently flexible model fit observed data is the main difference in principles of statistical modeling between the original Friedman curve and our average labor curve. An evidence-based approach to construct a labor curve and establish normal values should allow the statistical model to fit observed data. In addition, the presence of the deceleration phase in the active phase of an average labor curve was questioned. Forcing a deceleration phase to be part of the labor curve may have artificially raised the speed of progression in the active phase with a particularly large impact on earlier labor between 4 and 6 cm. Finally, any labor curve is illustrative and may not be instructive in managing labor because of variations in individual labor pattern and large errors in measuring cervical dilation. With the tools commonly available, it may be more productive to establish a new partogram that takes the physiology of labor and contemporary obstetric population into account. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Referenceless perceptual fog density prediction model

    NASA Astrophysics Data System (ADS)

    Choi, Lark Kwon; You, Jaehee; Bovik, Alan C.

    2014-02-01

    We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and "fog aware" statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.

  19. A new test statistic for climate models that includes field and spatial dependencies using Gaussian Markov random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel

    A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less

  20. Selecting statistical model and optimum maintenance policy: a case study of hydraulic pump.

    PubMed

    Ruhi, S; Karim, M R

    2016-01-01

    Proper maintenance policy can play a vital role for effective investigation of product reliability. Every engineered object such as product, plant or infrastructure needs preventive and corrective maintenance. In this paper we look at a real case study. It deals with the maintenance of hydraulic pumps used in excavators by a mining company. We obtain the data that the owner had collected and carry out an analysis and building models for pump failures. The data consist of both failure and censored lifetimes of the hydraulic pump. Different competitive mixture models are applied to analyze a set of maintenance data of a hydraulic pump. Various characteristics of the mixture models, such as the cumulative distribution function, reliability function, mean time to failure, etc. are estimated to assess the reliability of the pump. Akaike Information Criterion, adjusted Anderson-Darling test statistic, Kolmogrov-Smirnov test statistic and root mean square error are considered to select the suitable models among a set of competitive models. The maximum likelihood estimation method via the EM algorithm is applied mainly for estimating the parameters of the models and reliability related quantities. In this study, it is found that a threefold mixture model (Weibull-Normal-Exponential) fits well for the hydraulic pump failures data set. This paper also illustrates how a suitable statistical model can be applied to estimate the optimum maintenance period at a minimum cost of a hydraulic pump.

  1. Nondestructive Evaluation (NDE) Technology Initiatives (NTIP). Delivery Order 0039: Statistical Comparison of Competing Material Models

    DTIC Science & Technology

    2003-01-01

    adapted from Kass and Rafferty (1995) and Congdon (2001). Page 10 of 57 density adjusted for resin content, z, since resin contributes to the density...c.f.: Congdon , 2001). How to Download the WinBUGS Software Package BUGS was originally a statistical research project at the Medical Research...Likelihood Estimation,” July 2002, working paper to be published. 18) Congdon , Peter, Bayesian Statistical Modeling, Wiley, 2001 19) Cox, D. R. and

  2. Statistical characterization of the fatigue behavior of composite lamina

    NASA Technical Reports Server (NTRS)

    Yang, J. N.; Jones, D. L.

    1979-01-01

    A theoretical model was developed to predict statistically the effects of constant and variable amplitude fatigue loadings on the residual strength and fatigue life of composite lamina. The parameters in the model were established from the results of a series of static tensile tests and a fatigue scan and a number of verification tests were performed. Abstracts for two other papers on the effect of load sequence on the statistical fatigue of composites are also presented.

  3. A Frequency Domain Approach to Pretest Analysis Model Correlation and Model Updating for the Mid-Frequency Range

    DTIC Science & Technology

    2009-02-01

    range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The corresponding...frequency range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The...predictions. The averaging process is consistent with the averaging done in statistical energy analysis for stochastic systems. The FEM will always

  4. Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis

    PubMed Central

    McDermott, Josh H.; Simoncelli, Eero P.

    2014-01-01

    Rainstorms, insect swarms, and galloping horses produce “sound textures” – the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures. However, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation. PMID:21903084

  5. Statistical Learning is Related to Early Literacy-Related Skills

    PubMed Central

    Spencer, Mercedes; Kaschak, Michael P.; Jones, John L.; Lonigan, Christopher J.

    2015-01-01

    It has been demonstrated that statistical learning, or the ability to use statistical information to learn the structure of one’s environment, plays a role in young children’s acquisition of linguistic knowledge. Although most research on statistical learning has focused on language acquisition processes, such as the segmentation of words from fluent speech and the learning of syntactic structure, some recent studies have explored the extent to which individual differences in statistical learning are related to literacy-relevant knowledge and skills. The present study extends on this literature by investigating the relations between two measures of statistical learning and multiple measures of skills that are critical to the development of literacy—oral language, vocabulary knowledge, and phonological processing—within a single model. Our sample included a total of 553 typically developing children from prekindergarten through second grade. Structural equation modeling revealed that statistical learning accounted for a unique portion of the variance in these literacy-related skills. Practical implications for instruction and assessment are discussed. PMID:26478658

  6. Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

    PubMed

    Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu

    2015-06-01

    Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.

  7. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.

  8. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  9. A comparison of linear and nonlinear statistical techniques in performance attribution.

    PubMed

    Chan, N H; Genovese, C R

    2001-01-01

    Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.

  10. Joint Typhoon Warning Center (JTWC92) Model.

    DTIC Science & Technology

    1992-05-01

    Report Date. 13. Report Type and Dates Covered. I May 1992 IFinal - Contractor Report 4. Title and Subtitle. 5. FL iding Numbers. Final Report Joint...National Hurricane Center, Coral Gables, FL , 44 pp. I Neumann, C.J. and C.J. McAdie, 1991: A Revised National Hurricane Center NHC83 Model NHC90. NOAA...STATISTICAL-DYNAMICAL MODELS: HISTORICAL PERSPECTIVE The earliest known attempt at statistical-dynamical modeling is credi- ted to Veigas , (1966) for

  11. Statistical Modeling of Natural Backgrounds in Hyperspectral LWIR Data

    DTIC Science & Technology

    2016-09-06

    extremely important for studying performance trades. First, we study the validity of this model using real hyperspectral data, and compare the relative...difficult to validate any statistical model created for a target of interest. However, since background measurements are plentiful, it is reasonable to...Golden, S., Less, D., Jin, X., and Rynes, P., “ Modeling and analysis of LWIR signature variability associated with 3d and BRDF effects,” 98400P (May 2016

  12. A Management Information System Model for Program Management. Ph.D. Thesis - Oklahoma State Univ.; [Computerized Systems Analysis

    NASA Technical Reports Server (NTRS)

    Shipman, D. L.

    1972-01-01

    The development of a model to simulate the information system of a program management type of organization is reported. The model statistically determines the following parameters: type of messages, destinations, delivery durations, type processing, processing durations, communication channels, outgoing messages, and priorites. The total management information system of the program management organization is considered, including formal and informal information flows and both facilities and equipment. The model is written in General Purpose System Simulation 2 computer programming language for use on the Univac 1108, Executive 8 computer. The model is simulated on a daily basis and collects queue and resource utilization statistics for each decision point. The statistics are then used by management to evaluate proposed resource allocations, to evaluate proposed changes to the system, and to identify potential problem areas. The model employs both empirical and theoretical distributions which are adjusted to simulate the information flow being studied.

  13. Noninformative prior in the quantum statistical model of pure states

    NASA Astrophysics Data System (ADS)

    Tanaka, Fuyuhiko

    2012-06-01

    In the present paper, we consider a suitable definition of a noninformative prior on the quantum statistical model of pure states. While the full pure-states model is invariant under unitary rotation and admits the Haar measure, restricted models, which we often see in quantum channel estimation and quantum process tomography, have less symmetry and no compelling rationale for any choice. We adopt a game-theoretic approach that is applicable to classical Bayesian statistics and yields a noninformative prior for a general class of probability distributions. We define the quantum detection game and show that there exist noninformative priors for a general class of a pure-states model. Theoretically, it gives one of the ways that we represent ignorance on the given quantum system with partial information. Practically, our method proposes a default distribution on the model in order to use the Bayesian technique in the quantum-state tomography with a small sample.

  14. The log-periodic-AR(1)-GARCH(1,1) model for financial crashes

    NASA Astrophysics Data System (ADS)

    Gazola, L.; Fernandes, C.; Pizzinga, A.; Riera, R.

    2008-02-01

    This paper intends to meet recent claims for the attainment of more rigorous statistical methodology within the econophysics literature. To this end, we consider an econometric approach to investigate the outcomes of the log-periodic model of price movements, which has been largely used to forecast financial crashes. In order to accomplish reliable statistical inference for unknown parameters, we incorporate an autoregressive dynamic and a conditional heteroskedasticity structure in the error term of the original model, yielding the log-periodic-AR(1)-GARCH(1,1) model. Both the original and the extended models are fitted to financial indices of U. S. market, namely S&P500 and NASDAQ. Our analysis reveal two main points: (i) the log-periodic-AR(1)-GARCH(1,1) model has residuals with better statistical properties and (ii) the estimation of the parameter concerning the time of the financial crash has been improved.

  15. Camera-Model Identification Using Markovian Transition Probability Matrix

    NASA Astrophysics Data System (ADS)

    Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei

    Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.

  16. When mechanism matters: Bayesian forecasting using models of ecological diffusion

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

    2017-01-01

    Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.

  17. Pattern statistics on Markov chains and sensitivity to parameter estimation

    PubMed Central

    Nuel, Grégory

    2006-01-01

    Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916

  18. Pattern statistics on Markov chains and sensitivity to parameter estimation.

    PubMed

    Nuel, Grégory

    2006-10-17

    In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.

  19. Ensuring Positiveness of the Scaled Difference Chi-Square Test Statistic

    ERIC Educational Resources Information Center

    Satorra, Albert; Bentler, Peter M.

    2010-01-01

    A scaled difference test statistic T[tilde][subscript d] that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (Psychometrika 66:507-514, 2001). The statistic T[tilde][subscript d] is asymptotically equivalent to the scaled difference test statistic T[bar][subscript…

  20. A Role for Chunk Formation in Statistical Learning of Second Language Syntax

    ERIC Educational Resources Information Center

    Hamrick, Phillip

    2014-01-01

    Humans are remarkably sensitive to the statistical structure of language. However, different mechanisms have been proposed to account for such statistical sensitivities. The present study compared adult learning of syntax and the ability of two models of statistical learning to simulate human performance: Simple Recurrent Networks, which learn by…

  1. 10 CFR 431.445 - Determination of small electric motor efficiency.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... statistical analysis, computer simulation or modeling, or other analytic evaluation of performance data. (3... statistical analysis, computer simulation or modeling, and other analytic evaluation of performance data on.... (ii) If requested by the Department, the manufacturer shall conduct simulations to predict the...

  2. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  3. Evaluating measurement models in clinical research: covariance structure analysis of latent variable models of self-conception.

    PubMed

    Hoyle, R H

    1991-02-01

    Indirect measures of psychological constructs are vital to clinical research. On occasion, however, the meaning of indirect measures of psychological constructs is obfuscated by statistical procedures that do not account for the complex relations between items and latent variables and among latent variables. Covariance structure analysis (CSA) is a statistical procedure for testing hypotheses about the relations among items that indirectly measure a psychological construct and relations among psychological constructs. This article introduces clinical researchers to the strengths and limitations of CSA as a statistical procedure for conceiving and testing structural hypotheses that are not tested adequately with other statistical procedures. The article is organized around two empirical examples that illustrate the use of CSA for evaluating measurement models with correlated error terms, higher-order factors, and measured and latent variables.

  4. Statistical Models for Averaging of the Pump–Probe Traces: Example of Denoising in Terahertz Time-Domain Spectroscopy

    NASA Astrophysics Data System (ADS)

    Skorobogatiy, Maksim; Sadasivan, Jayesh; Guerboukha, Hichem

    2018-05-01

    In this paper, we first discuss the main types of noise in a typical pump-probe system, and then focus specifically on terahertz time domain spectroscopy (THz-TDS) setups. We then introduce four statistical models for the noisy pulses obtained in such systems, and detail rigorous mathematical algorithms to de-noise such traces, find the proper averages and characterise various types of experimental noise. Finally, we perform a comparative analysis of the performance, advantages and limitations of the algorithms by testing them on the experimental data collected using a particular THz-TDS system available in our laboratories. We conclude that using advanced statistical models for trace averaging results in the fitting errors that are significantly smaller than those obtained when only a simple statistical average is used.

  5. Statistical methods for investigating quiescence and other temporal seismicity patterns

    USGS Publications Warehouse

    Matthews, M.V.; Reasenberg, P.A.

    1988-01-01

    We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.

  6. Assessment of credit risk based on fuzzy relations

    NASA Astrophysics Data System (ADS)

    Tsabadze, Teimuraz

    2017-06-01

    The purpose of this paper is to develop a new approach for an assessment of the credit risk to corporate borrowers. There are different models for borrowers' risk assessment. These models are divided into two groups: statistical and theoretical. When assessing the credit risk for corporate borrowers, statistical model is unacceptable due to the lack of sufficiently large history of defaults. At the same time, we cannot use some theoretical models due to the lack of stock exchange. In those cases, when studying a particular borrower given that statistical base does not exist, the decision-making process is always of expert nature. The paper describes a new approach that may be used in group decision-making. An example of the application of the proposed approach is given.

  7. Statistical models of lunar rocks and regolith

    NASA Technical Reports Server (NTRS)

    Marcus, A. H.

    1973-01-01

    The mathematical, statistical, and computational approaches used in the investigation of the interrelationship of lunar fragmental material, regolith, lunar rocks, and lunar craters are described. The first two phases of the work explored the sensitivity of the production model of fragmental material to mathematical assumptions, and then completed earlier studies on the survival of lunar surface rocks with respect to competing processes. The third phase combined earlier work into a detailed statistical analysis and probabilistic model of regolith formation by lithologically distinct layers, interpreted as modified crater ejecta blankets. The fourth phase of the work dealt with problems encountered in combining the results of the entire project into a comprehensive, multipurpose computer simulation model for the craters and regolith. Highlights of each phase of research are given.

  8. Quantification of model uncertainty in aerosol optical thickness retrieval from Ozone Monitoring Instrument (OMI) measurements

    NASA Astrophysics Data System (ADS)

    Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. P.

    2013-09-01

    We study uncertainty quantification in remote sensing of aerosols in the atmosphere with top of the atmosphere reflectance measurements from the nadir-viewing Ozone Monitoring Instrument (OMI). Focus is on the uncertainty in aerosol model selection of pre-calculated aerosol models and on the statistical modelling of the model inadequacies. The aim is to apply statistical methodologies that improve the uncertainty estimates of the aerosol optical thickness (AOT) retrieval by propagating model selection and model error related uncertainties more realistically. We utilise Bayesian model selection and model averaging methods for the model selection problem and use Gaussian processes to model the smooth systematic discrepancies from the modelled to observed reflectance. The systematic model error is learned from an ensemble of operational retrievals. The operational OMI multi-wavelength aerosol retrieval algorithm OMAERO is used for cloud free, over land pixels of the OMI instrument with the additional Bayesian model selection and model discrepancy techniques. The method is demonstrated with four examples with different aerosol properties: weakly absorbing aerosols, forest fires over Greece and Russia, and Sahara dessert dust. The presented statistical methodology is general; it is not restricted to this particular satellite retrieval application.

  9. Evaluating model accuracy for model-based reasoning

    NASA Technical Reports Server (NTRS)

    Chien, Steve; Roden, Joseph

    1992-01-01

    Described here is an approach to automatically assessing the accuracy of various components of a model. In this approach, actual data from the operation of a target system is used to drive statistical measures to evaluate the prediction accuracy of various portions of the model. We describe how these statistical measures of model accuracy can be used in model-based reasoning for monitoring and design. We then describe the application of these techniques to the monitoring and design of the water recovery system of the Environmental Control and Life Support System (ECLSS) of Space Station Freedom.

  10. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  11. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  12. Identifying the Source of Misfit in Item Response Theory Models.

    PubMed

    Liu, Yang; Maydeu-Olivares, Alberto

    2014-01-01

    When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.

  13. Statistical dielectronic recombination rates for multielectron ions in plasma

    NASA Astrophysics Data System (ADS)

    Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.

    2017-10-01

    We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.

  14. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment

    PubMed Central

    Hashim, Mazlan

    2015-01-01

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning. PMID:25898919

  15. Statistical shear lag model - unraveling the size effect in hierarchical composites.

    PubMed

    Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D

    2015-05-01

    Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  16. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment.

    PubMed

    Shahabi, Himan; Hashim, Mazlan

    2015-04-22

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.

  17. Towards bridging the gap between climate change projections and maize producers in South Africa

    NASA Astrophysics Data System (ADS)

    Landman, Willem A.; Engelbrecht, Francois; Hewitson, Bruce; Malherbe, Johan; van der Merwe, Jacobus

    2018-05-01

    Multi-decadal regional projections of future climate change are introduced into a linear statistical model in order to produce an ensemble of austral mid-summer maximum temperature simulations for southern Africa. The statistical model uses atmospheric thickness fields from a high-resolution (0.5° × 0.5°) reanalysis-forced simulation as predictors in order to develop a linear recalibration model which represents the relationship between atmospheric thickness fields and gridded maximum temperatures across the region. The regional climate model, the conformal-cubic atmospheric model (CCAM), projects maximum temperatures increases over southern Africa to be in the order of 4 °C under low mitigation towards the end of the century or even higher. The statistical recalibration model is able to replicate these increasing temperatures, and the atmospheric thickness-maximum temperature relationship is shown to be stable under future climate conditions. Since dry land crop yields are not explicitly simulated by climate models but are sensitive to maximum temperature extremes, the effect of projected maximum temperature change on dry land crops of the Witbank maize production district of South Africa, assuming other factors remain unchanged, is then assessed by employing a statistical approach similar to the one used for maximum temperature projections.

  18. OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

    USGS Publications Warehouse

    Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

    2007-01-01

    The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.

  19. On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris

    NASA Technical Reports Server (NTRS)

    Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt

    2007-01-01

    A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.

  20. Response statistics of rotating shaft with non-linear elastic restoring forces by path integration

    NASA Astrophysics Data System (ADS)

    Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael

    2017-07-01

    Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.

  1. The impact on midlevel vision of statistically optimal divisive normalization in V1.

    PubMed

    Coen-Cagli, Ruben; Schwartz, Odelia

    2013-07-15

    The first two areas of the primate visual cortex (V1, V2) provide a paradigmatic example of hierarchical computation in the brain. However, neither the functional properties of V2 nor the interactions between the two areas are well understood. One key aspect is that the statistics of the inputs received by V2 depend on the nonlinear response properties of V1. Here, we focused on divisive normalization, a canonical nonlinear computation that is observed in many neural areas and modalities. We simulated V1 responses with (and without) different forms of surround normalization derived from statistical models of natural scenes, including canonical normalization and a statistically optimal extension that accounted for image nonhomogeneities. The statistics of the V1 population responses differed markedly across models. We then addressed how V2 receptive fields pool the responses of V1 model units with different tuning. We assumed this is achieved by learning without supervision a linear representation that removes correlations, which could be accomplished with principal component analysis. This approach revealed V2-like feature selectivity when we used the optimal normalization and, to a lesser extent, the canonical one but not in the absence of both. We compared the resulting two-stage models on two perceptual tasks; while models encompassing V1 surround normalization performed better at object recognition, only statistically optimal normalization provided systematic advantages in a task more closely matched to midlevel vision, namely figure/ground judgment. Our results suggest that experiments probing midlevel areas might benefit from using stimuli designed to engage the computations that characterize V1 optimality.

  2. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  3. The proposed 'concordance-statistic for benefit' provided a useful metric when modeling heterogeneous treatment effects.

    PubMed

    van Klaveren, David; Steyerberg, Ewout W; Serruys, Patrick W; Kent, David M

    2018-02-01

    Clinical prediction models that support treatment decisions are usually evaluated for their ability to predict the risk of an outcome rather than treatment benefit-the difference between outcome risk with vs. without therapy. We aimed to define performance metrics for a model's ability to predict treatment benefit. We analyzed data of the Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery (SYNTAX) trial and of three recombinant tissue plasminogen activator trials. We assessed alternative prediction models with a conventional risk concordance-statistic (c-statistic) and a novel c-statistic for benefit. We defined observed treatment benefit by the outcomes in pairs of patients matched on predicted benefit but discordant for treatment assignment. The 'c-for-benefit' represents the probability that from two randomly chosen matched patient pairs with unequal observed benefit, the pair with greater observed benefit also has a higher predicted benefit. Compared to a model without treatment interactions, the SYNTAX score II had improved ability to discriminate treatment benefit (c-for-benefit 0.590 vs. 0.552), despite having similar risk discrimination (c-statistic 0.725 vs. 0.719). However, for the simplified stroke-thrombolytic predictive instrument (TPI) vs. the original stroke-TPI, the c-for-benefit (0.584 vs. 0.578) was similar. The proposed methodology has the potential to measure a model's ability to predict treatment benefit not captured with conventional performance metrics. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  5. Pitfalls in statistical landslide susceptibility modelling

    NASA Astrophysics Data System (ADS)

    Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut

    2010-05-01

    The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.

  6. Obscure phenomena in statistical analysis of quantitative structure-activity relationships. Part 1: Multicollinearity of physicochemical descriptors.

    PubMed

    Mager, P P; Rothe, H

    1990-10-01

    Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.

  7. Calculation of precise firing statistics in a neural network model

    NASA Astrophysics Data System (ADS)

    Cho, Myoung Won

    2017-08-01

    A precise prediction of neural firing dynamics is requisite to understand the function of and the learning process in a biological neural network which works depending on exact spike timings. Basically, the prediction of firing statistics is a delicate manybody problem because the firing probability of a neuron at a time is determined by the summation over all effects from past firing states. A neural network model with the Feynman path integral formulation is recently introduced. In this paper, we present several methods to calculate firing statistics in the model. We apply the methods to some cases and compare the theoretical predictions with simulation results.

  8. Estimating procedure times for surgeries by determining location parameters for the lognormal model.

    PubMed

    Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

    2004-05-01

    We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

  9. The beta distribution: A statistical model for world cloud cover

    NASA Technical Reports Server (NTRS)

    Falls, L. W.

    1973-01-01

    Much work has been performed in developing empirical global cloud cover models. This investigation was made to determine an underlying theoretical statistical distribution to represent worldwide cloud cover. The beta distribution with probability density function is given to represent the variability of this random variable. It is shown that the beta distribution possesses the versatile statistical characteristics necessary to assume the wide variety of shapes exhibited by cloud cover. A total of 160 representative empirical cloud cover distributions were investigated and the conclusion was reached that this study provides sufficient statical evidence to accept the beta probability distribution as the underlying model for world cloud cover.

  10. ``Models'' CAVEAT EMPTOR!!!: ``Toy Models Too-Often Yield Toy-Results''!!!: Statistics, Polls, Politics, Economics, Elections!!!: GRAPH/Network-Physics: ``Equal-Distribution for All'' TRUMP-ED BEC ``Winner-Take-All'' ``Doctor Livingston I Presume?''

    NASA Astrophysics Data System (ADS)

    Preibus-Norquist, R. N. C.-Grover; Bush-Romney, G. W.-Willard-Mitt; Dimon, J. P.; Adelson-Koch, Sheldon-Charles-David-Sheldon; Krugman-Axelrod, Paul-David; Siegel, Edward Carl-Ludwig; D. N. C./O. F. P./''47''%/50% Collaboration; R. N. C./G. O. P./''53''%/49% Collaboration; Nyt/Wp/Cnn/Msnbc/Pbs/Npr/Ft Collaboration; Ftn/Fnc/Fox/Wsj/Fbn Collaboration; Lb/Jpmc/Bs/Boa/Ml/Wamu/S&P/Fitch/Moodys/Nmis Collaboration

    2013-03-01

    ``Models''? CAVEAT EMPTOR!!!: ``Toy Models Too-Often Yield Toy-Results''!!!: Goldenfeld[``The Role of Models in Physics'', in Lects.on Phase-Transitions & R.-G.(92)-p.32-33!!!]: statistics(Silver{[NYTimes; Bensinger, ``Math-Geerks Clearly-Defeated Pundits'', LATimes, (11/9/12)])}, polls, politics, economics, elections!!!: GRAPH/network/net/...-PHYSICS Barabasi-Albert[RMP (02)] (r,t)-space VERSUS(???) [Where's the Inverse/ Dual/Integral-Transform???] (Benjamin)Franklin(1795)-Fourier(1795; 1897;1822)-Laplace(1850)-Mellin (1902) Brillouin(1922)-...(k,)-space, {Hubbard [The World According to Wavelets,Peters (96)-p.14!!!/p.246: refs.-F2!!!]},and then (2) Albert-Barabasi[]Bose-Einstein quantum-statistics(BEQS) Bose-Einstein CONDENSATION (BEC) versus Bianconi[pvt.-comm.; arXiv:cond-mat/0204506; ...] -Barabasi [???] Fermi-Dirac

  11. Statistical methodology for the analysis of dye-switch microarray experiments

    PubMed Central

    Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques

    2008-01-01

    Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965

  12. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  13. Risk prediction model: Statistical and artificial neural network approach

    NASA Astrophysics Data System (ADS)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  14. Mathematical and Statistical Techniques for Systems Medicine: The Wnt Signaling Pathway as a Case Study.

    PubMed

    MacLean, Adam L; Harrington, Heather A; Stumpf, Michael P H; Byrne, Helen M

    2016-01-01

    The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.

  15. Cardiac arrest risk standardization using administrative data compared to registry data.

    PubMed

    Grossestreuer, Anne V; Gaieski, David F; Donnino, Michael W; Nelson, Joshua I M; Mutter, Eric L; Carr, Brendan G; Abella, Benjamin S; Wiebe, Douglas J

    2017-01-01

    Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Two risk standardization logistic regression models were developed using 2453 patients treated from 2000-2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the "gold standard" with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876-0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895-0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799-0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788-0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA.

  16. Cardiac arrest risk standardization using administrative data compared to registry data

    PubMed Central

    Gaieski, David F.; Donnino, Michael W.; Nelson, Joshua I. M.; Mutter, Eric L.; Carr, Brendan G.; Abella, Benjamin S.; Wiebe, Douglas J.

    2017-01-01

    Background Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Methods and results Two risk standardization logistic regression models were developed using 2453 patients treated from 2000–2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the “gold standard” with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876–0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895–0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799–0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788–0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Conclusions Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA. PMID:28783754

  17. Stochastic modeling of sunshine number data

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Paulescu, Marius; Badescu, Viorel

    2013-11-01

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.

  18. Stochastic modeling of sunshine number data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

    2013-11-13

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less

  19. Managing Clustered Data Using Hierarchical Linear Modeling

    ERIC Educational Resources Information Center

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  20. Functional status predicts acute care readmission in the traumatic spinal cord injury population.

    PubMed

    Huang, Donna; Slocum, Chloe; Silver, Julie K; Morgan, James W; Goldstein, Richard; Zafonte, Ross; Schneider, Jeffrey C

    2018-03-29

    Context/objective Acute care readmission has been identified as an important marker of healthcare quality. Most previous models assessing risk prediction of readmission incorporate variables for medical comorbidity. We hypothesized that functional status is a more robust predictor of readmission in the spinal cord injury population than medical comorbidities. Design Retrospective cross-sectional analysis. Setting Inpatient rehabilitation facilities, Uniform Data System for Medical Rehabilitation data from 2002 to 2012 Participants traumatic spinal cord injury patients. Outcome measures A logistic regression model for predicting acute care readmission based on demographic variables and functional status (Functional Model) was compared with models incorporating demographics, functional status, and medical comorbidities (Functional-Plus) or models including demographics and medical comorbidities (Demographic-Comorbidity). The primary outcomes were 3- and 30-day readmission, and the primary measure of model performance was the c-statistic. Results There were a total of 68,395 patients with 1,469 (2.15%) readmitted at 3 days and 7,081 (10.35%) readmitted at 30 days. The c-statistics for the Functional Model were 0.703 and 0.654 for 3 and 30 days. The Functional Model outperformed Demographic-Comorbidity models at 3 days (c-statistic difference: 0.066-0.096) and outperformed two of the three Demographic-Comorbidity models at 30 days (c-statistic difference: 0.029-0.056). The Functional-Plus models exhibited negligible improvements (0.002-0.010) in model performance compared to the Functional models. Conclusion Readmissions are used as a marker of hospital performance. Function-based readmission models in the spinal cord injury population outperform models incorporating medical comorbidities. Readmission risk models for this population would benefit from the inclusion of functional status.

  1. A systematic review of Bayesian articles in psychology: The last 25 years.

    PubMed

    van de Schoot, Rens; Winter, Sonja D; Ryan, Oisín; Zondervan-Zwijnenburg, Mariëlle; Depaoli, Sarah

    2017-06-01

    Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  2. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  3. A statistical shape model of the human second cervical vertebra.

    PubMed

    Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon

    2015-07-01

    Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.

  4. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  5. Statistical Evaluation of CRM-Simulated Cloud and Precipitation Structures Using Multi- sensor TRMM Measurements and Retrievals

    NASA Astrophysics Data System (ADS)

    Posselt, D.; L'Ecuyer, T.; Matsui, T.

    2009-05-01

    Cloud resolving models are typically used to examine the characteristics of clouds and precipitation and their relationship to radiation and the large-scale circulation. As such, they are not required to reproduce the exact location of each observed convective system, much less each individual cloud. Some of the most relevant information about clouds and precipitation is provided by instruments located on polar-orbiting satellite platforms, but these observations are intermittent "snapshots" in time, making assessment of model performance challenging. In contrast to direct comparison, model results can be evaluated statistically. This avoids the requirement for the model to reproduce the observed systems, while returning valuable information on the performance of the model in a climate-relevant sense. The focus of this talk is a model evaluation study, in which updates to the microphysics scheme used in a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model are evaluated using statistics of observed clouds, precipitation, and radiation. We present the results of multiday (non-equilibrium) simulations of organized deep convection using single- and double-moment versions of a the model's cloud microphysical scheme. Statistics of TRMM multi-sensor derived clouds, precipitation, and radiative fluxes are used to evaluate the GCE results, as are simulated TRMM measurements obtained using a sophisticated instrument simulator suite. We present advantages and disadvantages of performing model comparisons in retrieval and measurement space and conclude by motivating the use of data assimilation techniques for analyzing and improving model parameterizations.

  6. Statistical properties of superimposed stationary spike trains.

    PubMed

    Deger, Moritz; Helias, Moritz; Boucsein, Clemens; Rotter, Stefan

    2012-06-01

    The Poisson process is an often employed model for the activity of neuronal populations. It is known, though, that superpositions of realistic, non- Poisson spike trains are not in general Poisson processes, not even for large numbers of superimposed processes. Here we construct superimposed spike trains from intracellular in vivo recordings from rat neocortex neurons and compare their statistics to specific point process models. The constructed superimposed spike trains reveal strong deviations from the Poisson model. We find that superpositions of model spike trains that take the effective refractoriness of the neurons into account yield a much better description. A minimal model of this kind is the Poisson process with dead-time (PPD). For this process, and for superpositions thereof, we obtain analytical expressions for some second-order statistical quantities-like the count variability, inter-spike interval (ISI) variability and ISI correlations-and demonstrate the match with the in vivo data. We conclude that effective refractoriness is the key property that shapes the statistical properties of the superposition spike trains. We present new, efficient algorithms to generate superpositions of PPDs and of gamma processes that can be used to provide more realistic background input in simulations of networks of spiking neurons. Using these generators, we show in simulations that neurons which receive superimposed spike trains as input are highly sensitive for the statistical effects induced by neuronal refractoriness.

  7. Towards a General Turbulence Model for Planetary Boundary Layers Based on Direct Statistical Simulation

    NASA Astrophysics Data System (ADS)

    Skitka, J.; Marston, B.; Fox-Kemper, B.

    2016-02-01

    Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.

  8. Comments of statistical issue in numerical modeling for underground nuclear test monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nicholson, W.L.; Anderson, K.K.

    1993-03-01

    The Symposium concluded with prepared summaries by four experts in the involved disciplines. These experts made no mention of statistics and/or the statistical content of issues. The first author contributed an extemporaneous statement at the Symposium because there are important issues associated with conducting and evaluating numerical modeling that are familiar to statisticians and often treated successfully by them. This note expands upon these extemporaneous remarks. Statistical ideas may be helpful in resolving some numerical modeling issues. Specifically, we comment first on the role of statistical design/analysis in the quantification process to answer the question ``what do we know aboutmore » the numerical modeling of underground nuclear tests?`` and second on the peculiar nature of uncertainty analysis for situations involving numerical modeling. The simulations described in the workshop, though associated with topic areas, were basically sets of examples. Each simulation was tuned towards agreeing with either empirical evidence or an expert`s opinion of what empirical evidence would be. While the discussions were reasonable, whether the embellishments were correct or a forced fitting of reality is unclear and illustrates that ``simulation is easy.`` We also suggest that these examples of simulation are typical and the questions concerning the legitimacy and the role of knowing the reality are fair, in general, with respect to simulation. The answers will help us understand why ``prediction is difficult.``« less

  9. Mapping irrigated lands at 250-m scale by merging MODIS data and National Agricultural Statistics

    USGS Publications Warehouse

    Pervez, Md Shahriar; Brown, Jesslyn F.

    2010-01-01

    Accurate geospatial information on the extent of irrigated land improves our understanding of agricultural water use, local land surface processes, conservation or depletion of water resources, and components of the hydrologic budget. We have developed a method in a geospatial modeling framework that assimilates irrigation statistics with remotely sensed parameters describing vegetation growth conditions in areas with agricultural land cover to spatially identify irrigated lands at 250-m cell size across the conterminous United States for 2002. The geospatial model result, known as the Moderate Resolution Imaging Spectroradiometer (MODIS) Irrigated Agriculture Dataset (MIrAD-US), identified irrigated lands with reasonable accuracy in California and semiarid Great Plains states with overall accuracies of 92% and 75% and kappa statistics of 0.75 and 0.51, respectively. A quantitative accuracy assessment of MIrAD-US for the eastern region has not yet been conducted, and qualitative assessment shows that model improvements are needed for the humid eastern regions where the distinction in annual peak NDVI between irrigated and non-irrigated crops is minimal and county sizes are relatively small. This modeling approach enables consistent mapping of irrigated lands based upon USDA irrigation statistics and should lead to better understanding of spatial trends in irrigated lands across the conterminous United States. An improved version of the model with revised datasets is planned and will employ 2007 USDA irrigation statistics.

  10. Statistical Mechanics of Prion Diseases

    NASA Astrophysics Data System (ADS)

    Slepoy, A.; Singh, R. R.; Pázmándi, F.; Kulkarni, R. V.; Cox, D. L.

    2001-07-01

    We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ``species barriers'' to prion infection and assess a related treatment protocol.

  11. Binomial test statistics using Psi functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowman, Kimiko o

    2007-01-01

    For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.

  12. Group Influences on Young Adult Warfighters’ Risk Taking

    DTIC Science & Technology

    2016-12-01

    Statistical Analysis Latent linear growth models were fitted using the maximum likelihood estimation method in Mplus (version 7.0; Muthen & Muthen...condition had a higher net score than those in the alone condition (b = 20.53, SE = 6.29, p < .001). Results of the relevant statistical analyses are...8.56 110.86*** 22.01 158.25*** 29.91 Model fit statistics BIC 4004.50 5302.539 5540.58 Chi-square (df) 41.51*** (16) 38.10** (20) 42.19** (20

  13. Time Series Model Identification by Estimating Information.

    DTIC Science & Technology

    1982-11-01

    principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R

  14. Statistical modeling of space shuttle environmental data

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.; Brewer, D. W.

    1983-01-01

    Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.

  15. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  16. The Effectiveness of CPS-ALM Model in Enhancing Statistical Literacy Ability and Self Concept of Elementary School Student Teacher

    ERIC Educational Resources Information Center

    Takaria, J.; Rumahlatu, D.

    2016-01-01

    The focus of this study is to examine comprehensively statistical literacy and self-concept enhancement of elementary school student teacher through CPS-BML model in which this enhancement is measured through N-gain. The result of study indicate that the use of Collaborative Problem Solving Model assisted by literacy media (CPS-ALM) model…

  17. Climate change and water resources in a tropical island system: propagation of uncertainty from statistically downscaled climate models to hydrologic models

    Treesearch

    Ashley E. Van Beusekom; William A. Gould; Adam J. Terando; Jaime A. Collazo

    2015-01-01

    Many tropical islands have limited water resources with historically increasing demand, all potentially affected by a changing climate. The effects of climate change on island hydrology are difficult to model due to steep local precipitation gradients and sparse data. Thiswork uses 10 statistically downscaled general circulationmodels (GCMs) under two greenhouse gas...

  18. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  19. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

  20. A statistical approach to quasi-extinction forecasting.

    PubMed

    Holmes, Elizabeth Eli; Sabo, John L; Viscido, Steven Vincent; Fagan, William Fredric

    2007-12-01

    Forecasting population decline to a certain critical threshold (the quasi-extinction risk) is one of the central objectives of population viability analysis (PVA), and such predictions figure prominently in the decisions of major conservation organizations. In this paper, we argue that accurate forecasting of a population's quasi-extinction risk does not necessarily require knowledge of the underlying biological mechanisms. Because of the stochastic and multiplicative nature of population growth, the ensemble behaviour of population trajectories converges to common statistical forms across a wide variety of stochastic population processes. This paper provides a theoretical basis for this argument. We show that the quasi-extinction surfaces of a variety of complex stochastic population processes (including age-structured, density-dependent and spatially structured populations) can be modelled by a simple stochastic approximation: the stochastic exponential growth process overlaid with Gaussian errors. Using simulated and real data, we show that this model can be estimated with 20-30 years of data and can provide relatively unbiased quasi-extinction risk with confidence intervals considerably smaller than (0,1). This was found to be true even for simulated data derived from some of the noisiest population processes (density-dependent feedback, species interactions and strong age-structure cycling). A key advantage of statistical models is that their parameters and the uncertainty of those parameters can be estimated from time series data using standard statistical methods. In contrast for most species of conservation concern, biologically realistic models must often be specified rather than estimated because of the limited data available for all the various parameters. Biologically realistic models will always have a prominent place in PVA for evaluating specific management options which affect a single segment of a population, a single demographic rate, or different geographic areas. However, for forecasting quasi-extinction risk, statistical models that are based on the convergent statistical properties of population processes offer many advantages over biologically realistic models.

Top