Sample records for statistical model fitting

  1. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  2. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  3. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    PubMed

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  4. The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

    PubMed

    Tendeiro, Jorge N

    2017-01-01

    Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.

  5. Evaluating Item Fit for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Zhang, Bo; Stone, Clement A.

    2008-01-01

    This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…

  6. Model Fit and Item Factor Analysis: Overfactoring, Underfactoring, and a Program to Guide Interpretation.

    PubMed

    Clark, D Angus; Bowles, Ryan P

    2018-04-23

    In exploratory item factor analysis (IFA), researchers may use model fit statistics and commonly invoked fit thresholds to help determine the dimensionality of an assessment. However, these indices and thresholds may mislead as they were developed in a confirmatory framework for models with continuous, not categorical, indicators. The present study used Monte Carlo simulation methods to investigate the ability of popular model fit statistics (chi-square, root mean square error of approximation, the comparative fit index, and the Tucker-Lewis index) and their standard cutoff values to detect the optimal number of latent dimensions underlying sets of dichotomous items. Models were fit to data generated from three-factor population structures that varied in factor loading magnitude, factor intercorrelation magnitude, number of indicators, and whether cross loadings or minor factors were included. The effectiveness of the thresholds varied across fit statistics, and was conditional on many features of the underlying model. Together, results suggest that conventional fit thresholds offer questionable utility in the context of IFA.

  7. Analyzing longitudinal data with the linear mixed models procedure in SPSS.

    PubMed

    West, Brady T

    2009-09-01

    Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.

  8. Measured, modeled, and causal conceptions of fitness

    PubMed Central

    Abrams, Marshall

    2012-01-01

    This paper proposes partial answers to the following questions: in what senses can fitness differences plausibly be considered causes of evolution?What relationships are there between fitness concepts used in empirical research, modeling, and abstract theoretical proposals? How does the relevance of different fitness concepts depend on research questions and methodological constraints? The paper develops a novel taxonomy of fitness concepts, beginning with type fitness (a property of a genotype or phenotype), token fitness (a property of a particular individual), and purely mathematical fitness. Type fitness includes statistical type fitness, which can be measured from population data, and parametric type fitness, which is an underlying property estimated by statistical type fitnesses. Token fitness includes measurable token fitness, which can be measured on an individual, and tendential token fitness, which is assumed to be an underlying property of the individual in its environmental circumstances. Some of the paper's conclusions can be outlined as follows: claims that fitness differences do not cause evolution are reasonable when fitness is treated as statistical type fitness, measurable token fitness, or purely mathematical fitness. Some of the ways in which statistical methods are used in population genetics suggest that what natural selection involves are differences in parametric type fitnesses. Further, it's reasonable to think that differences in parametric type fitness can cause evolution. Tendential token fitnesses, however, are not themselves sufficient for natural selection. Though parametric type fitnesses are typically not directly measurable, they can be modeled with purely mathematical fitnesses and estimated by statistical type fitnesses, which in turn are defined in terms of measurable token fitnesses. The paper clarifies the ways in which fitnesses depend on pragmatic choices made by researchers. PMID:23112804

  9. Model-Free CUSUM Methods for Person Fit

    ERIC Educational Resources Information Center

    Armstrong, Ronald D.; Shi, Min

    2009-01-01

    This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

  10. Comparing the Fit of Item Response Theory and Factor Analysis Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto; Cai, Li; Hernandez, Adolfo

    2011-01-01

    Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be…

  11. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  12. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  13. Modified Likelihood-Based Item Fit Statistics for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.

    2008-01-01

    Orlando and Thissen (2000) developed an item fit statistic for binary item response theory (IRT) models known as S-X[superscript 2]. This article generalizes their statistic to polytomous unfolding models. Four alternative formulations of S-X[superscript 2] are developed for the generalized graded unfolding model (GGUM). The GGUM is a…

  14. Person-Fit Statistics for Joint Models for Accuracy and Speed

    ERIC Educational Resources Information Center

    Fox, Jean-Paul; Marianti, Sukaesi

    2017-01-01

    Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person-fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person-fit tests…

  15. Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models

    ERIC Educational Resources Information Center

    Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning

    2012-01-01

    The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…

  16. Nonlinear Curve-Fitting Program

    NASA Technical Reports Server (NTRS)

    Everhart, Joel L.; Badavi, Forooz F.

    1989-01-01

    Nonlinear optimization algorithm helps in finding best-fit curve. Nonlinear Curve Fitting Program, NLINEAR, interactive curve-fitting routine based on description of quadratic expansion of X(sup 2) statistic. Utilizes nonlinear optimization algorithm calculating best statistically weighted values of parameters of fitting function and X(sup 2) minimized. Provides user with such statistical information as goodness of fit and estimated values of parameters producing highest degree of correlation between experimental data and mathematical model. Written in FORTRAN 77.

  17. An Application of M[subscript 2] Statistic to Evaluate the Fit of Cognitive Diagnostic Models

    ERIC Educational Resources Information Center

    Liu, Yanlou; Tian, Wei; Xin, Tao

    2016-01-01

    The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M[subscript 2] and the associated root mean square error of approximation (RMSEA[subscript 2]) in item factor analysis were extended to evaluate the fit of…

  18. Evaluation of Model Fit in Cognitive Diagnosis Models

    ERIC Educational Resources Information Center

    Hu, Jinxiang; Miller, M. David; Huggins-Manley, Anne Corinne; Chen, Yi-Hsin

    2016-01-01

    Cognitive diagnosis models (CDMs) estimate student ability profiles using latent attributes. Model fit to the data needs to be ascertained in order to determine whether inferences from CDMs are valid. This study investigated the usefulness of some popular model fit statistics to detect CDM fit including relative fit indices (AIC, BIC, and CAIC),…

  19. Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

    ERIC Educational Resources Information Center

    Ferrando, Pere Joan

    2010-01-01

    This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…

  20. SPSS macros to compare any two fitted values from a regression model.

    PubMed

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  1. Does Breast Cancer Drive the Building of Survival Probability Models among States? An Assessment of Goodness of Fit for Patient Data from SEER Registries

    PubMed

    Khan, Hafiz; Saxena, Anshul; Perisetti, Abhilash; Rafiq, Aamrin; Gabbidon, Kemesha; Mende, Sarah; Lyuksyutova, Maria; Quesada, Kandi; Blakely, Summre; Torres, Tiffany; Afesse, Mahlet

    2016-12-01

    Background: Breast cancer is a worldwide public health concern and is the most prevalent type of cancer in women in the United States. This study concerned the best fit of statistical probability models on the basis of survival times for nine state cancer registries: California, Connecticut, Georgia, Hawaii, Iowa, Michigan, New Mexico, Utah, and Washington. Materials and Methods: A probability random sampling method was applied to select and extract records of 2,000 breast cancer patients from the Surveillance Epidemiology and End Results (SEER) database for each of the nine state cancer registries used in this study. EasyFit software was utilized to identify the best probability models by using goodness of fit tests, and to estimate parameters for various statistical probability distributions that fit survival data. Results: Statistical analysis for the summary of statistics is reported for each of the states for the years 1973 to 2012. Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared goodness of fit test values were used for survival data, the highest values of goodness of fit statistics being considered indicative of the best fit survival model for each state. Conclusions: It was found that California, Connecticut, Georgia, Iowa, New Mexico, and Washington followed the Burr probability distribution, while the Dagum probability distribution gave the best fit for Michigan and Utah, and Hawaii followed the Gamma probability distribution. These findings highlight differences between states through selected sociodemographic variables and also demonstrate probability modeling differences in breast cancer survival times. The results of this study can be used to guide healthcare providers and researchers for further investigations into social and environmental factors in order to reduce the occurrence of and mortality due to breast cancer. Creative Commons Attribution License

  2. Identifying the Source of Misfit in Item Response Theory Models.

    PubMed

    Liu, Yang; Maydeu-Olivares, Alberto

    2014-01-01

    When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.

  3. An Investigation of Item Fit Statistics for Mixed IRT Models

    ERIC Educational Resources Information Center

    Chon, Kyong Hee

    2009-01-01

    The purpose of this study was to investigate procedures for assessing model fit of IRT models for mixed format data. In this study, various IRT model combinations were fitted to data containing both dichotomous and polytomous item responses, and the suitability of the chosen model mixtures was evaluated based on a number of model fit procedures.…

  4. Predicting future protection of respirator users: Statistical approaches and practical implications.

    PubMed

    Hu, Chengcheng; Harber, Philip; Su, Jing

    2016-01-01

    The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.

  5. Limited-information goodness-of-fit testing of diagnostic classification item response models.

    PubMed

    Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

    2016-11-01

    Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X 2 and the likelihood ratio statistic G 2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M 2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M 2 was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M 2 , we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M 2 and XLD2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144). © 2016 The British Psychological Society.

  6. Statistical modelling for recurrent events: an application to sports injuries

    PubMed Central

    Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F

    2014-01-01

    Background Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. Objective This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Methods Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. Results The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Conclusions Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. PMID:22872683

  7. An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Penfield, Randall D.

    2015-01-01

    Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing…

  8. An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

    ERIC Educational Resources Information Center

    Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

    2013-01-01

    Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…

  9. Statistical model to perform error analysis of curve fits of wind tunnel test data using the techniques of analysis of variance and regression analysis

    NASA Technical Reports Server (NTRS)

    Alston, D. W.

    1981-01-01

    The considered research had the objective to design a statistical model that could perform an error analysis of curve fits of wind tunnel test data using analysis of variance and regression analysis techniques. Four related subproblems were defined, and by solving each of these a solution to the general research problem was obtained. The capabilities of the evolved true statistical model are considered. The least squares fit is used to determine the nature of the force, moment, and pressure data. The order of the curve fit is increased in order to delete the quadratic effect in the residuals. The analysis of variance is used to determine the magnitude and effect of the error factor associated with the experimental data.

  10. Goodness-of-Fit Assessment of Item Response Theory Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto

    2013-01-01

    The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…

  11. Linearised and non-linearised isotherm models optimization analysis by error functions and statistical means

    PubMed Central

    2014-01-01

    In adsorption study, to describe sorption process and evaluation of best-fitting isotherm model is a key analysis to investigate the theoretical hypothesis. Hence, numerous statistically analysis have been extensively used to estimate validity of the experimental equilibrium adsorption values with the predicted equilibrium values. Several statistical error analysis were carried out. In the present study, the following statistical analysis were carried out to evaluate the adsorption isotherm model fitness, like the Pearson correlation, the coefficient of determination and the Chi-square test, have been used. The ANOVA test was carried out for evaluating significance of various error functions and also coefficient of dispersion were evaluated for linearised and non-linearised models. The adsorption of phenol onto natural soil (Local name Kalathur soil) was carried out, in batch mode at 30 ± 20 C. For estimating the isotherm parameters, to get a holistic view of the analysis the models were compared between linear and non-linear isotherm models. The result reveled that, among above mentioned error functions and statistical functions were designed to determine the best fitting isotherm. PMID:25018878

  12. Discrete time rescaling theorem: determining goodness of fit for discrete time statistical models of neural spiking.

    PubMed

    Haslinger, Robert; Pipa, Gordon; Brown, Emery

    2010-10-01

    One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time-rescaling theorem provides a goodness-of-fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model's spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov-Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies on assumptions of continuously defined time and instantaneous events. However, spikes have finite width, and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time-rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time-rescaling theorem that analytically corrects for the effects of finite resolution. This allows us to define a rescaled time that is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting generalized linear models to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false-positive rate of the KS test and greatly increasing the reliability of model evaluation based on the time-rescaling theorem.

  13. An astronomer's guide to period searching

    NASA Astrophysics Data System (ADS)

    Schwarzenberg-Czerny, A.

    2003-03-01

    We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.

  14. Discrete Time Rescaling Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of Neural Spiking

    PubMed Central

    Haslinger, Robert; Pipa, Gordon; Brown, Emery

    2010-01-01

    One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time rescaling theorem provides a goodness of fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model’s spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies upon assumptions of continuously defined time and instantaneous events. However spikes have finite width and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time rescaling theorem which analytically corrects for the effects of finite resolution. This allows us to define a rescaled time which is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting Generalized Linear Models (GLMs) to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false positive rate of the KS test and greatly increasing the reliability of model evaluation based upon the time rescaling theorem. PMID:20608868

  15. How Good Are Statistical Models at Approximating Complex Fitness Landscapes?

    PubMed Central

    du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian

    2016-01-01

    Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564

  16. Estimating order statistics of network degrees

    NASA Astrophysics Data System (ADS)

    Chu, J.; Nadarajah, S.

    2018-01-01

    We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.

  17. Sensitivity of Fit Indices to Misspecification in Growth Curve Models

    ERIC Educational Resources Information Center

    Wu, Wei; West, Stephen G.

    2010-01-01

    This study investigated the sensitivity of fit indices to model misspecification in within-individual covariance structure, between-individual covariance structure, and marginal mean structure in growth curve models. Five commonly used fit indices were examined, including the likelihood ratio test statistic, root mean square error of…

  18. Model fit evaluation in multilevel structural equation models

    PubMed Central

    Ryu, Ehri

    2014-01-01

    Assessing goodness of model fit is one of the key questions in structural equation modeling (SEM). Goodness of fit is the extent to which the hypothesized model reproduces the multivariate structure underlying the set of variables. During the earlier development of multilevel structural equation models, the “standard” approach was to evaluate the goodness of fit for the entire model across all levels simultaneously. The model fit statistics produced by the standard approach have a potential problem in detecting lack of fit in the higher-level model for which the effective sample size is much smaller. Also when the standard approach results in poor model fit, it is not clear at which level the model does not fit well. This article reviews two alternative approaches that have been proposed to overcome the limitations of the standard approach. One is a two-step procedure which first produces estimates of saturated covariance matrices at each level and then performs single-level analysis at each level with the estimated covariance matrices as input (Yuan and Bentler, 2007). The other level-specific approach utilizes partially saturated models to obtain test statistics and fit indices for each level separately (Ryu and West, 2009). Simulation studies (e.g., Yuan and Bentler, 2007; Ryu and West, 2009) have consistently shown that both alternative approaches performed well in detecting lack of fit at any level, whereas the standard approach failed to detect lack of fit at the higher level. It is recommended that the alternative approaches are used to assess the model fit in multilevel structural equation model. Advantages and disadvantages of the two alternative approaches are discussed. The alternative approaches are demonstrated in an empirical example. PMID:24550882

  19. A Comparison of Four Estimators of a Population Measure of Model Fit in Covariance Structure Analysis

    ERIC Educational Resources Information Center

    Zhang, Wei

    2008-01-01

    A major issue in the utilization of covariance structure analysis is model fit evaluation. Recent years have witnessed increasing interest in various test statistics and so-called fit indexes, most of which are actually based on or closely related to F[subscript 0], a measure of model fit in the population. This study aims to provide a systematic…

  20. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

    PubMed

    Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

    2018-06-01

    Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.

  1. Curve fitting and modeling with splines using statistical variable selection techniques

    NASA Technical Reports Server (NTRS)

    Smith, P. L.

    1982-01-01

    The successful application of statistical variable selection techniques to fit splines is demonstrated. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs, using the B-spline basis, were developed. The program for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.

  2. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  3. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    ERIC Educational Resources Information Center

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  4. Group Influences on Young Adult Warfighters’ Risk Taking

    DTIC Science & Technology

    2016-12-01

    Statistical Analysis Latent linear growth models were fitted using the maximum likelihood estimation method in Mplus (version 7.0; Muthen & Muthen...condition had a higher net score than those in the alone condition (b = 20.53, SE = 6.29, p < .001). Results of the relevant statistical analyses are...8.56 110.86*** 22.01 158.25*** 29.91 Model fit statistics BIC 4004.50 5302.539 5540.58 Chi-square (df) 41.51*** (16) 38.10** (20) 42.19** (20

  5. A Comparison of Item Fit Statistics for Mixed IRT Models

    ERIC Educational Resources Information Center

    Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B.

    2010-01-01

    In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

  6. Statistical Power of Alternative Structural Models for Comparative Effectiveness Research: Advantages of Modeling Unreliability.

    PubMed

    Coman, Emil N; Iordache, Eugen; Dierker, Lisa; Fifield, Judith; Schensul, Jean J; Suggs, Suzanne; Barbour, Russell

    2014-05-01

    The advantages of modeling the unreliability of outcomes when evaluating the comparative effectiveness of health interventions is illustrated. Adding an action-research intervention component to a regular summer job program for youth was expected to help in preventing risk behaviors. A series of simple two-group alternative structural equation models are compared to test the effect of the intervention on one key attitudinal outcome in terms of model fit and statistical power with Monte Carlo simulations. Some models presuming parameters equal across the intervention and comparison groups were underpowered to detect the intervention effect, yet modeling the unreliability of the outcome measure increased their statistical power and helped in the detection of the hypothesized effect. Comparative Effectiveness Research (CER) could benefit from flexible multi-group alternative structural models organized in decision trees, and modeling unreliability of measures can be of tremendous help for both the fit of statistical models to the data and their statistical power.

  7. Person-Fit and the Rasch Model, with an Application to Knowledge of Logical Quantors.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.; Hoijtink, Herbert

    1996-01-01

    Some specific person-fit results for the Rasch model are presented, followed by an application to a test measuring knowledge of reasoning with logical quantors. Some issues are relevant to all attempts to use person-fit statistics in research, but the special role of the Rasch model is highlighted. (SLD)

  8. How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto; Montano, Rosa

    2013-01-01

    We investigate the performance of three statistics, R [subscript 1], R [subscript 2] (Glas in "Psychometrika" 53:525-546, 1988), and M [subscript 2] (Maydeu-Olivares & Joe in "J. Am. Stat. Assoc." 100:1009-1020, 2005, "Psychometrika" 71:713-732, 2006) to assess the overall fit of a one-parameter logistic model…

  9. Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

    ERIC Educational Resources Information Center

    Tay, Louis; Drasgow, Fritz

    2012-01-01

    Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

  10. Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

    PubMed

    Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

    2016-05-01

    Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.

  11. Student Background, School Climate, School Disorder, and Student Achievement: An Empirical Study of New York City's Middle Schools

    ERIC Educational Resources Information Center

    Chen, Greg; Weikart, Lynne A.

    2008-01-01

    This study develops and tests a school disorder and student achievement model based upon the school climate framework. The model was fitted to 212 New York City middle schools using the Structural Equations Modeling Analysis method. The analysis shows that the model fits the data well based upon test statistics and goodness of fit indices. The…

  12. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  13. Statistical evaluation of the metallurgical test data in the ORR-PSF-PVS irradiation experiment. [PWR; BWR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stallmann, F.W.

    1984-08-01

    A statistical analysis of Charpy test results of the two-year Pressure Vessel Simulation metallurgical irradiation experiment was performed. Determination of transition temperature and upper shelf energy derived from computer fits compare well with eyeball fits. Uncertainties for all results can be obtained with computer fits. The results were compared with predictions in Regulatory Guide 1.99 and other irradiation damage models.

  14. Measuring the Processes of Change From the Transtheoretical Model for Physical Activity and Exercise in Overweight and Obese Adults.

    PubMed

    Romain, Ahmed Jerôme; Bernard, Paquito; Hokayem, Marie; Gernigon, Christophe; Avignon, Antoine

    2016-03-01

    This study aimed to test three factorial structures conceptualizing the processes of change (POC) from the transtheoretical model and to examine the relationships between the POC and stages of change (SOC) among overweight and obese adults. Cross-sectional study. This study was conducted at the University Hospital of Montpellier, France. A sample of 289 overweight or obese participants (199 women) was enrolled in the study. Participants completed the POC and SOC questionnaires during a 5-day hospitalization for weight management. Structural equation modeling was used to compare the different factorial structures. The unweighted least-squares method was used to identify the best-fit indices for the five fully correlated model (goodness-of-fit statistic = .96; adjusted goodness-of-fit statistic = .95; standardized root mean residual = .062; normed-fit index = .95; parsimonious normed-fit index = .83; parsimonious goodness-of-fit statistic = .78). The multivariate analysis of variance was significant (p < .001). A post hoc test showed that individuals in advanced SOC used more of both experiential and behavioral POC than those in preaction stages, with effect sizes ranging from .06 to .29. This study supports the validity of the factorial structure of POC concerning physical activity and confirms the assumption that, in this context, people with excess weight use both experiential and behavioral processes. These preliminary results should be confirmed in a longitudinal study. © The Author(s) 2016.

  15. Curve fitting methods for solar radiation data modeling

    NASA Astrophysics Data System (ADS)

    Karim, Samsul Ariffin Abdul; Singh, Balbir Singh Mahinder

    2014-10-01

    This paper studies the use of several type of curve fitting method to smooth the global solar radiation data. After the data have been fitted by using curve fitting method, the mathematical model of global solar radiation will be developed. The error measurement was calculated by using goodness-fit statistics such as root mean square error (RMSE) and the value of R2. The best fitting methods will be used as a starting point for the construction of mathematical modeling of solar radiation received in Universiti Teknologi PETRONAS (UTP) Malaysia. Numerical results indicated that Gaussian fitting and sine fitting (both with two terms) gives better results as compare with the other fitting methods.

  16. Curve fitting methods for solar radiation data modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karim, Samsul Ariffin Abdul, E-mail: samsul-ariffin@petronas.com.my, E-mail: balbir@petronas.com.my; Singh, Balbir Singh Mahinder, E-mail: samsul-ariffin@petronas.com.my, E-mail: balbir@petronas.com.my

    2014-10-24

    This paper studies the use of several type of curve fitting method to smooth the global solar radiation data. After the data have been fitted by using curve fitting method, the mathematical model of global solar radiation will be developed. The error measurement was calculated by using goodness-fit statistics such as root mean square error (RMSE) and the value of R{sup 2}. The best fitting methods will be used as a starting point for the construction of mathematical modeling of solar radiation received in Universiti Teknologi PETRONAS (UTP) Malaysia. Numerical results indicated that Gaussian fitting and sine fitting (both withmore » two terms) gives better results as compare with the other fitting methods.« less

  17. Statistical aspects of modeling the labor curve.

    PubMed

    Zhang, Jun; Troendle, James; Grantz, Katherine L; Reddy, Uma M

    2015-06-01

    In a recent review by Cohen and Friedman, several statistical questions on modeling labor curves were raised. This article illustrates that asking data to fit a preconceived model or letting a sufficiently flexible model fit observed data is the main difference in principles of statistical modeling between the original Friedman curve and our average labor curve. An evidence-based approach to construct a labor curve and establish normal values should allow the statistical model to fit observed data. In addition, the presence of the deceleration phase in the active phase of an average labor curve was questioned. Forcing a deceleration phase to be part of the labor curve may have artificially raised the speed of progression in the active phase with a particularly large impact on earlier labor between 4 and 6 cm. Finally, any labor curve is illustrative and may not be instructive in managing labor because of variations in individual labor pattern and large errors in measuring cervical dilation. With the tools commonly available, it may be more productive to establish a new partogram that takes the physiology of labor and contemporary obstetric population into account. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Performance of the Generalized S-X[Superscript 2] Item Fit Index for Polytomous IRT Models

    ERIC Educational Resources Information Center

    Kang, Taehoon; Chen, Troy T.

    2008-01-01

    Orlando and Thissen's S-X[superscript 2] item fit index has performed better than traditional item fit statistics such as Yen' s Q[subscript 1] and McKinley and Mill' s G[superscript 2] for dichotomous item response theory (IRT) models. This study extends the utility of S-X[superscript 2] to polytomous IRT models, including the generalized partial…

  19. Regression Analysis as a Cost Estimation Model for Unexploded Ordnance Cleanup at Former Military Installations

    DTIC Science & Technology

    2002-06-01

    fits our actual data . To determine the goodness of fit, statisticians typically use the following four measures: R2 Statistic. The R2 statistic...reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of...mathematical model is developed to better estimate cleanup costs using historical cost data that could be used by the Defense Department prior to placing

  20. Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies.

    PubMed

    Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

    2018-03-15

    Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. We propose a methodology based on Cox mixed models and written under the R language. This semiparametric model is indeed flexible enough to fit duration data. To compare log-linear and Cox mixed models in terms of goodness-of-fit on real data sets, we also provide a procedure based on simulations and quantile-quantile plots. We present two examples from a data set of speech and gesture interactions, which illustrate the limitations of linear and log-linear mixed models, as compared to Cox models. The linear models are not validated on our data, whereas Cox models are. Moreover, in the second example, the Cox model exhibits a significant effect that the linear model does not. We provide methods to select the best-fitting models for repeated duration data and to compare statistical methodologies. In this study, we show that Cox models are best suited to the analysis of our data set.

  1. Covariance Structure Model Fit Testing under Missing Data: An Application of the Supplemented EM Algorithm

    ERIC Educational Resources Information Center

    Cai, Li; Lee, Taehun

    2009-01-01

    We apply the Supplemented EM algorithm (Meng & Rubin, 1991) to address a chronic problem with the "two-stage" fitting of covariance structure models in the presence of ignorable missing data: the lack of an asymptotically chi-square distributed goodness-of-fit statistic. We show that the Supplemented EM algorithm provides a…

  2. Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

    NASA Astrophysics Data System (ADS)

    Cianciara, Aleksander

    2016-09-01

    The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.

  3. Use of Selected Goodness-of-Fit Statistics to Assess the Accuracy of a Model of Henry Hagg Lake, Oregon

    NASA Astrophysics Data System (ADS)

    Rounds, S. A.; Sullivan, A. B.

    2004-12-01

    Assessing a model's ability to reproduce field data is a critical step in the modeling process. For any model, some method of determining goodness-of-fit to measured data is needed to aid in calibration and to evaluate model performance. Visualizations and graphical comparisons of model output are an excellent way to begin that assessment. At some point, however, model performance must be quantified. Goodness-of-fit statistics, including the mean error (ME), mean absolute error (MAE), root mean square error, and coefficient of determination, typically are used to measure model accuracy. Statistical tools such as the sign test or Wilcoxon test can be used to test for model bias. The runs test can detect phase errors in simulated time series. Each statistic is useful, but each has its limitations. None provides a complete quantification of model accuracy. In this study, a suite of goodness-of-fit statistics was applied to a model of Henry Hagg Lake in northwest Oregon. Hagg Lake is a man-made reservoir on Scoggins Creek, a tributary to the Tualatin River. Located on the west side of the Portland metropolitan area, the Tualatin Basin is home to more than 450,000 people. Stored water in Hagg Lake helps to meet the agricultural and municipal water needs of that population. Future water demands have caused water managers to plan for a potential expansion of Hagg Lake, doubling its storage to roughly 115,000 acre-feet. A model of the lake was constructed to evaluate the lake's water quality and estimate how that quality might change after raising the dam. The laterally averaged, two-dimensional, U.S. Army Corps of Engineers model CE-QUAL-W2 was used to construct the Hagg Lake model. Calibrated for the years 2000 and 2001 and confirmed with data from 2002 and 2003, modeled parameters included water temperature, ammonia, nitrate, phosphorus, algae, zooplankton, and dissolved oxygen. Several goodness-of-fit statistics were used to quantify model accuracy and bias. Model performance was judged to be excellent for water temperature (annual ME: -0.22 to 0.05 ° C; annual MAE: 0.62 to 0.68 ° C) and dissolved oxygen (annual ME: -0.28 to 0.18 mg/L; annual MAE: 0.43 to 0.92 mg/L), showing that the model is sufficiently accurate for future water resources planning and management.

  4. Comparison of statistical models to estimate parasite growth rate in the induced blood stage malaria model.

    PubMed

    Wockner, Leesa F; Hoffmann, Isabell; O'Rourke, Peter; McCarthy, James S; Marquart, Louise

    2017-08-25

    The efficacy of vaccines aimed at inhibiting the growth of malaria parasites in the blood can be assessed by comparing the growth rate of parasitaemia in the blood of subjects treated with a test vaccine compared to controls. In studies using induced blood stage malaria (IBSM), a type of controlled human malaria infection, parasite growth rate has been measured using models with the intercept on the y-axis fixed to the inoculum size. A set of statistical models was evaluated to determine an optimal methodology to estimate parasite growth rate in IBSM studies. Parasite growth rates were estimated using data from 40 subjects published in three IBSM studies. Data was fitted using 12 statistical models: log-linear, sine-wave with the period either fixed to 48 h or not fixed; these models were fitted with the intercept either fixed to the inoculum size or not fixed. All models were fitted by individual, and overall by study using a mixed effects model with a random effect for the individual. Log-linear models and sine-wave models, with the period fixed or not fixed, resulted in similar parasite growth rate estimates (within 0.05 log 10 parasites per mL/day). Average parasite growth rate estimates for models fitted by individual with the intercept fixed to the inoculum size were substantially lower by an average of 0.17 log 10 parasites per mL/day (range 0.06-0.24) compared with non-fixed intercept models. Variability of parasite growth rate estimates across the three studies analysed was substantially higher (3.5 times) for fixed-intercept models compared with non-fixed intercept models. The same tendency was observed in models fitted overall by study. Modelling data by individual or overall by study had minimal effect on parasite growth estimates. The analyses presented in this report confirm that fixing the intercept to the inoculum size influences parasite growth estimates. The most appropriate statistical model to estimate the growth rate of blood-stage parasites in IBSM studies appears to be a log-linear model fitted by individual and with the intercept estimated in the log-linear regression. Future studies should use this model to estimate parasite growth rates.

  5. The Many Null Distributions of Person Fit Indices.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.; Hoijtink, Herbert

    1990-01-01

    Statistical properties of person fit indices are reviewed as indicators of the extent to which a person's score pattern is in agreement with a measurement model. Distribution of a fit index and ability-free fit evaluation are discussed. The null distribution was simulated for a test of 20 items. (SLD)

  6. Ionospheric scintillation studies

    NASA Technical Reports Server (NTRS)

    Rino, C. L.; Freemouw, E. J.

    1973-01-01

    The diffracted field of a monochromatic plane wave was characterized by two complex correlation functions. For a Gaussian complex field, these quantities suffice to completely define the statistics of the field. Thus, one can in principle calculate the statistics of any measurable quantity in terms of the model parameters. The best data fits were achieved for intensity statistics derived under the Gaussian statistics hypothesis. The signal structure that achieved the best fit was nearly invariant with scintillation level and irregularity source (ionosphere or solar wind). It was characterized by the fact that more than 80% of the scattered signal power is in phase quadrature with the undeviated or coherent signal component. Thus, the Gaussian-statistics hypothesis is both convenient and accurate for channel modeling work.

  7. Comparative analysis through probability distributions of a data set

    NASA Astrophysics Data System (ADS)

    Cristea, Gabriel; Constantinescu, Dan Mihai

    2018-02-01

    In practice, probability distributions are applied in such diverse fields as risk analysis, reliability engineering, chemical engineering, hydrology, image processing, physics, market research, business and economic research, customer support, medicine, sociology, demography etc. This article highlights important aspects of fitting probability distributions to data and applying the analysis results to make informed decisions. There are a number of statistical methods available which can help us to select the best fitting model. Some of the graphs display both input data and fitted distributions at the same time, as probability density and cumulative distribution. The goodness of fit tests can be used to determine whether a certain distribution is a good fit. The main used idea is to measure the "distance" between the data and the tested distribution, and compare that distance to some threshold values. Calculating the goodness of fit statistics also enables us to order the fitted distributions accordingly to how good they fit to data. This particular feature is very helpful for comparing the fitted models. The paper presents a comparison of most commonly used goodness of fit tests as: Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared. A large set of data is analyzed and conclusions are drawn by visualizing the data, comparing multiple fitted distributions and selecting the best model. These graphs should be viewed as an addition to the goodness of fit tests.

  8. A Model Fit Statistic for Generalized Partial Credit Model

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.

    2009-01-01

    Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic…

  9. A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839

    ERIC Educational Resources Information Center

    Cai, Li; Monroe, Scott

    2014-01-01

    We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…

  10. The Evaluation and Selection of Adequate Causal Models: A Compensatory Education Example.

    ERIC Educational Resources Information Center

    Tanaka, Jeffrey S.

    1982-01-01

    Implications of model evaluation (using traditional chi square goodness of fit statistics, incremental fit indices for covariance structure models, and latent variable coefficients of determination) on substantive conclusions are illustrated with an example examining the effects of participation in a compensatory education program on posttreatment…

  11. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    PubMed

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  12. Modeling of adsorption isotherms of water vapor on Tunisian olive leaves using statistical mechanical formulation

    NASA Astrophysics Data System (ADS)

    Knani, S.; Aouaini, F.; Bahloul, N.; Khalfaoui, M.; Hachicha, M. A.; Ben Lamine, A.; Kechaou, N.

    2014-04-01

    Analytical expression for modeling water adsorption isotherms of food or agricultural products is developed using the statistical mechanics formalism. The model developed in this paper is further used to fit and interpret the isotherms of four varieties of Tunisian olive leaves called “Chemlali, Chemchali, Chetoui and Zarrazi”. The parameters involved in the model such as the number of adsorbed water molecules per site, n, the receptor sites density, NM, and the energetic parameters, a1 and a2, were determined by fitting the experimental adsorption isotherms at temperatures ranging from 303 to 323 K. We interpret the results of fitting. After that, the model is further applied to calculate thermodynamic functions which govern the adsorption mechanism such as entropy, the free enthalpy of Gibbs and the internal energy.

  13. Blueprint XAS: a Matlab-based toolbox for the fitting and analysis of XAS spectra.

    PubMed

    Delgado-Jaime, Mario Ulises; Mewis, Craig Philip; Kennepohl, Pierre

    2010-01-01

    Blueprint XAS is a new Matlab-based program developed to fit and analyse X-ray absorption spectroscopy (XAS) data, most specifically in the near-edge region of the spectrum. The program is based on a methodology that introduces a novel background model into the complete fit model and that is capable of generating any number of independent fits with minimal introduction of user bias [Delgado-Jaime & Kennepohl (2010), J. Synchrotron Rad. 17, 119-128]. The functions and settings on the five panels of its graphical user interface are designed to suit the needs of near-edge XAS data analyzers. A batch function allows for the setting of multiple jobs to be run with Matlab in the background. A unique statistics panel allows the user to analyse a family of independent fits, to evaluate fit models and to draw statistically supported conclusions. The version introduced here (v0.2) is currently a toolbox for Matlab. Future stand-alone versions of the program will also incorporate several other new features to create a full package of tools for XAS data processing.

  14. Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting.

    PubMed

    Hodge, Kari J; Morgan, Grant B

    Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.

  15. Robustness of fit indices to outliers and leverage observations in structural equation modeling.

    PubMed

    Yuan, Ke-Hai; Zhong, Xiaoling

    2013-06-01

    Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).

  16. A stochastic fractional dynamics model of space-time variability of rain

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun K.; Travis, James E.

    2013-09-01

    varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.

  17. Epistasis and the Structure of Fitness Landscapes: Are Experimental Fitness Landscapes Compatible with Fisher’s Geometric Model?

    PubMed Central

    Blanquart, François; Bataillon, Thomas

    2016-01-01

    The fitness landscape defines the relationship between genotypes and fitness in a given environment and underlies fundamental quantities such as the distribution of selection coefficient and the magnitude and type of epistasis. A better understanding of variation in landscape structure across species and environments is thus necessary to understand and predict how populations will adapt. An increasing number of experiments investigate the properties of fitness landscapes by identifying mutations, constructing genotypes with combinations of these mutations, and measuring the fitness of these genotypes. Yet these empirical landscapes represent a very small sample of the vast space of all possible genotypes, and this sample is often biased by the protocol used to identify mutations. Here we develop a rigorous statistical framework based on Approximate Bayesian Computation to address these concerns and use this flexible framework to fit a broad class of phenotypic fitness models (including Fisher’s model) to 26 empirical landscapes representing nine diverse biological systems. Despite uncertainty owing to the small size of most published empirical landscapes, the inferred landscapes have similar structure in similar biological systems. Surprisingly, goodness-of-fit tests reveal that this class of phenotypic models, which has been successful so far in interpreting experimental data, is a plausible in only three of nine biological systems. More precisely, although Fisher’s model was able to explain several statistical properties of the landscapes—including the mean and SD of selection and epistasis coefficients—it was often unable to explain the full structure of fitness landscapes. PMID:27052568

  18. Particle size distributions by transmission electron microscopy: an interlaboratory comparison case study

    PubMed Central

    Rice, Stephen B; Chan, Christopher; Brown, Scott C; Eschbach, Peter; Han, Li; Ensor, David S; Stefaniak, Aleksandr B; Bonevich, John; Vladár, András E; Hight Walker, Angela R; Zheng, Jiwen; Starnes, Catherine; Stromberg, Arnold; Ye, Jia; Grulke, Eric A

    2015-01-01

    This paper reports an interlaboratory comparison that evaluated a protocol for measuring and analysing the particle size distribution of discrete, metallic, spheroidal nanoparticles using transmission electron microscopy (TEM). The study was focused on automated image capture and automated particle analysis. NIST RM8012 gold nanoparticles (30 nm nominal diameter) were measured for area-equivalent diameter distributions by eight laboratories. Statistical analysis was used to (1) assess the data quality without using size distribution reference models, (2) determine reference model parameters for different size distribution reference models and non-linear regression fitting methods and (3) assess the measurement uncertainty of a size distribution parameter by using its coefficient of variation. The interlaboratory area-equivalent diameter mean, 27.6 nm ± 2.4 nm (computed based on a normal distribution), was quite similar to the area-equivalent diameter, 27.6 nm, assigned to NIST RM8012. The lognormal reference model was the preferred choice for these particle size distributions as, for all laboratories, its parameters had lower relative standard errors (RSEs) than the other size distribution reference models tested (normal, Weibull and Rosin–Rammler–Bennett). The RSEs for the fitted standard deviations were two orders of magnitude higher than those for the fitted means, suggesting that most of the parameter estimate errors were associated with estimating the breadth of the distributions. The coefficients of variation for the interlaboratory statistics also confirmed the lognormal reference model as the preferred choice. From quasi-linear plots, the typical range for good fits between the model and cumulative number-based distributions was 1.9 fitted standard deviations less than the mean to 2.3 fitted standard deviations above the mean. Automated image capture, automated particle analysis and statistical evaluation of the data and fitting coefficients provide a framework for assessing nanoparticle size distributions using TEM for image acquisition. PMID:26361398

  19. Using the Modification Index and Standardized Expected Parameter Change for Model Modification

    ERIC Educational Resources Information Center

    Whittaker, Tiffany A.

    2012-01-01

    Model modification is oftentimes conducted after discovering a badly fitting structural equation model. During the modification process, the modification index (MI) and the standardized expected parameter change (SEPC) are 2 statistics that may be used to aid in the selection of parameters to add to a model to improve the fit. The purpose of this…

  20. A comparison of methods of fitting several models to nutritional response data.

    PubMed

    Vedenov, D; Pesti, G M

    2008-02-01

    A variety of models have been proposed to fit nutritional input-output response data. The models are typically nonlinear; therefore, fitting the models usually requires sophisticated statistical software and training to use it. An alternative tool for fitting nutritional response models was developed by using widely available and easier-to-use Microsoft Excel software. The tool, implemented as an Excel workbook (NRM.xls), allows simultaneous fitting and side-by-side comparisons of several popular models. This study compared the results produced by the tool we developed and PROC NLIN of SAS. The models compared were the broken line (ascending linear and quadratic segments), saturation kinetics, 4-parameter logistics, sigmoidal, and exponential models. The NRM.xls workbook provided results nearly identical to those of PROC NLIN. Furthermore, the workbook successfully fit several models that failed to converge in PROC NLIN. Two data sets were used as examples to compare fits by the different models. The results suggest that no particular nonlinear model is necessarily best for all nutritional response data.

  1. Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II)

    DTIC Science & Technology

    2014-09-30

    Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II) Len Thomas, John Harwood, Catriona Harris, and Robert S... mammals changes over time. This project will develop statistical tools to allow mathematical models of the population consequences of acoustic...disturbance to be fitted to data from marine mammal populations. We will work closely with Phase II of the ONR PCAD Working Group, and will provide

  2. Evaluating Structural Equation Models for Categorical Outcomes: A New Test Statistic and a Practical Challenge of Interpretation.

    PubMed

    Monroe, Scott; Cai, Li

    2015-01-01

    This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.

  3. Fitting and Modeling in the ASC Data Analysis Environment

    NASA Astrophysics Data System (ADS)

    Doe, S.; Siemiginowska, A.; Joye, W.; McDowell, J.

    As part of the AXAF Science Center (ASC) Data Analysis Environment, we will provide to the astronomical community a Fitting Application. We present a design of the application in this paper. Our design goal is to give the user the flexibility to use a variety of optimization techniques (Levenberg-Marquardt, maximum entropy, Monte Carlo, Powell, downhill simplex, CERN-Minuit, and simulated annealing) and fit statistics (chi (2) , Cash, variance, and maximum likelihood); our modular design allows the user easily to add their own optimization techniques and/or fit statistics. We also present a comparison of the optimization techniques to be provided by the Application. The high spatial and spectral resolutions that will be obtained with AXAF instruments require a sophisticated data modeling capability. We will provide not only a suite of astronomical spatial and spectral source models, but also the capability of combining these models into source models of up to four data dimensions (i.e., into source functions f(E,x,y,t)). We will also provide tools to create instrument response models appropriate for each observation.

  4. Genetic algorithm dynamics on a rugged landscape

    NASA Astrophysics Data System (ADS)

    Bornholdt, Stefan

    1998-04-01

    The genetic algorithm is an optimization procedure motivated by biological evolution and is successfully applied to optimization problems in different areas. A statistical mechanics model for its dynamics is proposed based on the parent-child fitness correlation of the genetic operators, making it applicable to general fitness landscapes. It is compared to a recent model based on a maximum entropy ansatz. Finally it is applied to modeling the dynamics of a genetic algorithm on the rugged fitness landscape of the NK model.

  5. Maximum-likelihood curve-fitting scheme for experiments with pulsed lasers subject to intensity fluctuations.

    PubMed

    Metz, Thomas; Walewski, Joachim; Kaminski, Clemens F

    2003-03-20

    Evaluation schemes, e.g., least-squares fitting, are not generally applicable to any types of experiments. If the evaluation schemes were not derived from a measurement model that properly described the experiment to be evaluated, poorer precision or accuracy than attainable from the measured data could result. We outline ways in which statistical data evaluation schemes should be derived for all types of experiment, and we demonstrate them for laser-spectroscopic experiments, in which pulse-to-pulse fluctuations of the laser power cause correlated variations of laser intensity and generated signal intensity. The method of maximum likelihood is demonstrated in the derivation of an appropriate fitting scheme for this type of experiment. Statistical data evaluation contains the following steps. First, one has to provide a measurement model that considers statistical variation of all enclosed variables. Second, an evaluation scheme applicable to this particular model has to be derived or provided. Third, the scheme has to be characterized in terms of accuracy and precision. A criterion for accepting an evaluation scheme is that it have accuracy and precision as close as possible to the theoretical limit. The fitting scheme derived for experiments with pulsed lasers is compared to well-established schemes in terms of fitting power and rational functions. The precision is found to be as much as three timesbetter than for simple least-squares fitting. Our scheme also suppresses the bias on the estimated model parameters that other methods may exhibit if they are applied in an uncritical fashion. We focus on experiments in nonlinear spectroscopy, but the fitting scheme derived is applicable in many scientific disciplines.

  6. [Study on the effect of different impression methods on the marginal fit of all-ceramic crowns].

    PubMed

    Zhan, Lilin; Zeng, Liwei; Chen, Ping; Liao, Lan; Li, Shiyue; Liu, Renying

    2015-08-01

    To investigate the effect of three different impression methods on the marginal fit of all-ceramic crowns. The three methods include scanning silicone rubber impression, cast models, and direct optical impression. The polymethyl methacrylate (PMMA) material of a mandibular first molar in standard model was prepared with 16 models duplicated. The all-ceramic crowns were prepared using three different impression methods. Accurate impressions were made using silicone rubber, and the cast models were obtained. The PMMA models, silicone rubber impressions, and cast models were scanned, and digital models of three groups were obtained to produce 48 zirconia all-ceramic crowns with computer aided design/computer aided manufacture. The marginal fit of these groups was measured by silicone rubber gap impression. Statistical analysis was performed with SPSS 17.0 software. The marginal fit of direct optical impression groups, silicone rubber impression groups, cast model groups was (69.18±9.47), (81.04±10.88), (84.42±9.96) µm. A significant difference was observed in the marginal fit of the direct optical impression groups and the other groups (P<0.05). No statistically significant difference was observed in the marginal fit of the silicone rubber impression groups and the cast model groups (P>0.05). All marginal measurement sites are clinically acceptable by the three different impression scanning methods. The silicone rubber impression scanning method can be used for all-ceramic restorations.

  7. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

    PubMed Central

    2010-01-01

    Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791

  8. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

    PubMed

    Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

    2010-05-18

    The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.

  9. Right-Sizing Statistical Models for Longitudinal Data

    PubMed Central

    Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.

    2015-01-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507

  10. Right-sizing statistical models for longitudinal data.

    PubMed

    Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

    2015-12-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).

  11. Structure of Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition Criteria for Obsessive–Compulsive Personality Disorder in Patients With Binge Eating Disorder

    PubMed Central

    Ansell, Emily B; Pinto, Anthony; Edelen, Maria Orlando; Grilo, Carlos M

    2013-01-01

    Objective To examine 1-, 2-, and 3-factor model structures through confirmatory analytic procedures for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) obsessive–compulsive personality disorder (OCPD) criteria in patients with binge eating disorder (BED). Method Participants were consecutive outpatients (n = 263) with binge eating disorder and were assessed with semi-structured interviews. The 8 OCPD criteria were submitted to confirmatory factor analyses in Mplus Version 4.2 (Los Angeles, CA) in which previously identified factor models of OCPD were compared for fit, theoretical relevance, and parsimony. Nested models were compared for significant improvements in model fit. Results Evaluation of indices of fit in combination with theoretical considerations suggest a multifactorial model is a significant improvement in fit over the current DSM-IV single-factor model of OCPD. Though the data support both 2- and 3-factor models, the 3-factor model is hindered by an underspecified third factor. Conclusion A multifactorial model of OCPD incorporating the factors perfectionism and rigidity represents the best compromise of fit and theory in modelling the structure of OCPD in patients with BED. A third factor representing miserliness may be relevant in BED populations but needs further development. The perfectionism and rigidity factors may represent distinct intrapersonal and interpersonal attempts at control and may have implications for the assessment of OCPD. PMID:19087485

  12. Structure of diagnostic and statistical manual of mental disorders, fourth edition criteria for obsessive-compulsive personality disorder in patients with binge eating disorder.

    PubMed

    Ansell, Emily B; Pinto, Anthony; Edelen, Maria Orlando; Grilo, Carlos M

    2008-12-01

    To examine 1-, 2-, and 3-factor model structures through confirmatory analytic procedures for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) obsessive-compulsive personality disorder (OCPD) criteria in patients with binge eating disorder (BED). Participants were consecutive outpatients (n = 263) with binge eating disorder and were assessed with semi-structured interviews. The 8 OCPD criteria were submitted to confirmatory factor analyses in Mplus Version 4.2 (Los Angeles, CA) in which previously identified factor models of OCPD were compared for fit, theoretical relevance, and parsimony. Nested models were compared for significant improvements in model fit. Evaluation of indices of fit in combination with theoretical considerations suggest a multifactorial model is a significant improvement in fit over the current DSM-IV single- factor model of OCPD. Though the data support both 2- and 3-factor models, the 3-factor model is hindered by an underspecified third factor. A multifactorial model of OCPD incorporating the factors perfectionism and rigidity represents the best compromise of fit and theory in modelling the structure of OCPD in patients with BED. A third factor representing miserliness may be relevant in BED populations but needs further development. The perfectionism and rigidity factors may represent distinct intrapersonal and interpersonal attempts at control and may have implications for the assessment of OCPD.

  13. The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models

    ERIC Educational Resources Information Center

    Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver

    2012-01-01

    Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item-fit statistics for correct and misspecified diagnostic classification models within a log-linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3…

  14. An evaluation of the structural validity of the shoulder pain and disability index (SPADI) using the Rasch model.

    PubMed

    Jerosch-Herold, Christina; Chester, Rachel; Shepstone, Lee; Vincent, Joshua I; MacDermid, Joy C

    2018-02-01

    The shoulder pain and disability index (SPADI) has been extensively evaluated for its psychometric properties using classical test theory (CTT). The purpose of this study was to evaluate its structural validity using Rasch model analysis. Responses to the SPADI from 1030 patients referred for physiotherapy with shoulder pain and enrolled in a prospective cohort study were available for Rasch model analysis. Overall fit, individual person and item fit, response format, dependence, unidimensionality, targeting, reliability and differential item functioning (DIF) were examined. The SPADI pain subscale initially demonstrated a misfit due to DIF by age and gender. After iterative analysis it showed good fit to the Rasch model with acceptable targeting and unidimensionality (overall fit Chi-square statistic 57.2, p = 0.1; mean item fit residual 0.19 (1.5) and mean person fit residual 0.44 (1.1); person separation index (PSI) of 0.83. The disability subscale however shows significant misfit due to uniform DIF even after iterative analyses were used to explore different solutions to the sources of misfit (overall fit (Chi-square statistic 57.2, p = 0.1); mean item fit residual 0.54 (1.26) and mean person fit residual 0.38 (1.0); PSI 0.84). Rasch Model analysis of the SPADI has identified some strengths and limitations not previously observed using CTT methods. The SPADI should be treated as two separate subscales. The SPADI is a widely used outcome measure in clinical practice and research; however, the scores derived from it must be interpreted with caution. The pain subscale fits the Rasch model expectations well. The disability subscale does not fit the Rasch model and its current format does not meet the criteria for true interval-level measurement required for use as a primary endpoint in clinical trials. Clinicians should therefore exercise caution when interpreting score changes on the disability subscale and attempt to compare their scores to age- and sex-stratified data.

  15. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  16. Assessment of corneal properties based on statistical modeling of OCT speckle.

    PubMed

    Jesus, Danilo A; Iskander, D Robert

    2017-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.

  17. Philosophy and the practice of Bayesian statistics

    PubMed Central

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2015-01-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575

  18. Philosophy and the practice of Bayesian statistics.

    PubMed

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2013-02-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.

  19. An Optimization Principle for Deriving Nonequilibrium Statistical Models of Hamiltonian Dynamics

    NASA Astrophysics Data System (ADS)

    Turkington, Bruce

    2013-08-01

    A general method for deriving closed reduced models of Hamiltonian dynamical systems is developed using techniques from optimization and statistical estimation. Given a vector of resolved variables, selected to describe the macroscopic state of the system, a family of quasi-equilibrium probability densities on phase space corresponding to the resolved variables is employed as a statistical model, and the evolution of the mean resolved vector is estimated by optimizing over paths of these densities. Specifically, a cost function is constructed to quantify the lack-of-fit to the microscopic dynamics of any feasible path of densities from the statistical model; it is an ensemble-averaged, weighted, squared-norm of the residual that results from submitting the path of densities to the Liouville equation. The path that minimizes the time integral of the cost function determines the best-fit evolution of the mean resolved vector. The closed reduced equations satisfied by the optimal path are derived by Hamilton-Jacobi theory. When expressed in terms of the macroscopic variables, these equations have the generic structure of governing equations for nonequilibrium thermodynamics. In particular, the value function for the optimization principle coincides with the dissipation potential that defines the relation between thermodynamic forces and fluxes. The adjustable closure parameters in the best-fit reduced equations depend explicitly on the arbitrary weights that enter into the lack-of-fit cost function. Two particular model reductions are outlined to illustrate the general method. In each example the set of weights in the optimization principle contracts into a single effective closure parameter.

  20. Meta-analysis of Gaussian individual patient data: Two-stage or not two-stage?

    PubMed

    Morris, Tim P; Fisher, David J; Kenward, Michael G; Carpenter, James R

    2018-04-30

    Quantitative evidence synthesis through meta-analysis is central to evidence-based medicine. For well-documented reasons, the meta-analysis of individual patient data is held in higher regard than aggregate data. With access to individual patient data, the analysis is not restricted to a "two-stage" approach (combining estimates and standard errors) but can estimate parameters of interest by fitting a single model to all of the data, a so-called "one-stage" analysis. There has been debate about the merits of one- and two-stage analysis. Arguments for one-stage analysis have typically noted that a wider range of models can be fitted and overall estimates may be more precise. The two-stage side has emphasised that the models that can be fitted in two stages are sufficient to answer the relevant questions, with less scope for mistakes because there are fewer modelling choices to be made in the two-stage approach. For Gaussian data, we consider the statistical arguments for flexibility and precision in small-sample settings. Regarding flexibility, several of the models that can be fitted only in one stage may not be of serious interest to most meta-analysis practitioners. Regarding precision, we consider fixed- and random-effects meta-analysis and see that, for a model making certain assumptions, the number of stages used to fit this model is irrelevant; the precision will be approximately equal. Meta-analysts should choose modelling assumptions carefully. Sometimes relevant models can only be fitted in one stage. Otherwise, meta-analysts are free to use whichever procedure is most convenient to fit the identified model. © 2018 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  1. A study of finite mixture model: Bayesian approach on financial time series data

    NASA Astrophysics Data System (ADS)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-07-01

    Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.

  2. Assessing the fit of site-occupancy models

    USGS Publications Warehouse

    MacKenzie, D.I.; Bailey, L.L.

    2004-01-01

    Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data collected on terrestrial salamanders in the Plethodon glutinosus complex in the Great Smoky Mountains National Park, USA, and wish to address the question 'how accurately does the fitted model represent the data?' The goodness-of-fit of the model needs to be assessed in order to make accurate inferences. This article presents a method where a simple Pearson chi-square statistic is calculated and a parametric bootstrap procedure is used to determine whether the observed statistic is unusually large. We found evidence that the most global model considered provides a poor fit to the data, hence estimated an overdispersion factor to adjust model selection procedures and inflate standard errors. Two hypothetical datasets with known assumption violations are also analyzed, illustrating that the method may be used to guide researchers to making appropriate inferences. The results of a simulation study are presented to provide a broader view of the methods properties.

  3. An Item Fit Statistic Based on Pseudocounts from the Generalized Graded Unfolding Model: A Preliminary Report.

    ERIC Educational Resources Information Center

    Roberts, James S.

    Stone and colleagues (C. Stone, R. Ankenman, S. Lane, and M. Liu, 1993; C. Stone, R. Mislevy and J. Mazzeo, 1994; C. Stone, 2000) have proposed a fit index that explicitly accounts for the measurement error inherent in an estimated theta value, here called chi squared superscript 2, subscript i*. The elements of this statistic are natural…

  4. Paleotemperature reconstruction from mammalian phosphate δ18O records - an alternative view on data processing

    NASA Astrophysics Data System (ADS)

    Skrzypek, Grzegorz; Sadler, Rohan; Wiśniewski, Andrzej

    2017-04-01

    The stable oxygen isotope composition of phosphates (δ18O) extracted from mammalian bone and teeth material is commonly used as a proxy for paleotemperature. Historically, several different analytical and statistical procedures for determining air paleotemperatures from the measured δ18O of phosphates have been applied. This inconsistency in both stable isotope data processing and the application of statistical procedures has led to large and unwanted differences between calculated results. This study presents the uncertainty associated with two of the most commonly used regression methods: least squares inverted fit and transposed fit. We assessed the performance of these methods by designing and applying calculation experiments to multiple real-life data sets, calculating in reverse temperatures, and comparing them with true recorded values. Our calculations clearly show that the mean absolute errors are always substantially higher for the inverted fit (a causal model), with the transposed fit (a predictive model) returning mean values closer to the measured values (Skrzypek et al. 2015). The predictive models always performed better than causal models, with 12-65% lower mean absolute errors. Moreover, the least-squares regression (LSM) model is more appropriate than Reduced Major Axis (RMA) regression for calculating the environmental water stable oxygen isotope composition from phosphate signatures, as well as for calculating air temperature from the δ18O value of environmental water. The transposed fit introduces a lower overall error than the inverted fit for both the δ18O of environmental water and Tair calculations; therefore, the predictive models are more statistically efficient than the causal models in this instance. The direct comparison of paleotemperature results from different laboratories and studies may only be achieved if a single method of calculation is applied. Reference Skrzypek G., Sadler R., Wiśniewski A., 2016. Reassessment of recommendations for processing mammal phosphate δ18O data for paleotemperature reconstruction. Palaeogeography, Palaeoclimatology, Palaeoecology 446, 162-167.

  5. Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*

    DOE PAGES

    Castruccio, Stefano; McInerney, David J.; Stein, Michael L.; ...

    2014-02-24

    The authors describe a new approach for emulating the output of a fully coupled climate model under arbitrary forcing scenarios that is based on a small set of precomputed runs from the model. Temperature and precipitation are expressed as simple functions of the past trajectory of atmospheric CO 2 concentrations, and a statistical model is fit using a limited set of training runs. The approach is demonstrated to be a useful and computationally efficient alternative to pattern scaling and captures the nonlinear evolution of spatial patterns of climate anomalies inherent in transient climates. The approach does as well as patternmore » scaling in all circumstances and substantially better in many; it is not computationally demanding; and, once the statistical model is fit, it produces emulated climate output effectively instantaneously. In conclusion, it may therefore find wide application in climate impacts assessments and other policy analyses requiring rapid climate projections.« less

  6. 'Chain pooling' model selection as developed for the statistical analysis of a rotor burst protection experiment

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1977-01-01

    A statistical decision procedure called chain pooling had been developed for model selection in fitting the results of a two-level fixed-effects full or fractional factorial experiment not having replication. The basic strategy included the use of one nominal level of significance for a preliminary test and a second nominal level of significance for the final test. The subject has been reexamined from the point of view of using as many as three successive statistical model deletion procedures in fitting the results of a single experiment. The investigation consisted of random number studies intended to simulate the results of a proposed aircraft turbine-engine rotor-burst-protection experiment. As a conservative approach, population model coefficients were chosen to represent a saturated 2 to the 4th power experiment with a distribution of parameter values unfavorable to the decision procedures. Three model selection strategies were developed.

  7. Comparison of hypertabastic survival model with other unimodal hazard rate functions using a goodness-of-fit test.

    PubMed

    Tahir, M Ramzan; Tran, Quang X; Nikulin, Mikhail S

    2017-05-30

    We studied the problem of testing a hypothesized distribution in survival regression models when the data is right censored and survival times are influenced by covariates. A modified chi-squared type test, known as Nikulin-Rao-Robson statistic, is applied for the comparison of accelerated failure time models. This statistic is used to test the goodness-of-fit for hypertabastic survival model and four other unimodal hazard rate functions. The results of simulation study showed that the hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution. In statistical modeling, because of its flexible shape of hazard functions, this distribution can also be used as a competitor of Birnbaum-Saunders and inverse Gaussian distributions. The results for the real data application are shown. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Statistical distributions of extreme dry spell in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Zin, Wan Zawiah Wan; Jemain, Abdul Aziz

    2010-11-01

    Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.

  9. α -induced reactions on 115In: Cross section measurements and statistical model analysis

    NASA Astrophysics Data System (ADS)

    Kiss, G. G.; Szücs, T.; Mohr, P.; Török, Zs.; Huszánk, R.; Gyürky, Gy.; Fülöp, Zs.

    2018-05-01

    Background: α -nucleus optical potentials are basic ingredients of statistical model calculations used in nucleosynthesis simulations. While the nucleon+nucleus optical potential is fairly well known, for the α +nucleus optical potential several different parameter sets exist and large deviations, reaching sometimes even an order of magnitude, are found between the cross section predictions calculated using different parameter sets. Purpose: A measurement of the radiative α -capture and the α -induced reaction cross sections on the nucleus 115In at low energies allows a stringent test of statistical model predictions. Since experimental data are scarce in this mass region, this measurement can be an important input to test the global applicability of α +nucleus optical model potentials and further ingredients of the statistical model. Methods: The reaction cross sections were measured by means of the activation method. The produced activities were determined by off-line detection of the γ rays and characteristic x rays emitted during the electron capture decay of the produced Sb isotopes. The 115In(α ,γ )119Sb and 115In(α ,n )Sb118m reaction cross sections were measured between Ec .m .=8.83 and 15.58 MeV, and the 115In(α ,n )Sb118g reaction was studied between Ec .m .=11.10 and 15.58 MeV. The theoretical analysis was performed within the statistical model. Results: The simultaneous measurement of the (α ,γ ) and (α ,n ) cross sections allowed us to determine a best-fit combination of all parameters for the statistical model. The α +nucleus optical potential is identified as the most important input for the statistical model. The best fit is obtained for the new Atomki-V1 potential, and good reproduction of the experimental data is also achieved for the first version of the Demetriou potentials and the simple McFadden-Satchler potential. The nucleon optical potential, the γ -ray strength function, and the level density parametrization are also constrained by the data although there is no unique best-fit combination. Conclusions: The best-fit calculations allow us to extrapolate the low-energy (α ,γ ) cross section of 115In to the astrophysical Gamow window with reasonable uncertainties. However, still further improvements of the α -nucleus potential are required for a global description of elastic (α ,α ) scattering and α -induced reactions in a wide range of masses and energies.

  10. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  11. Statistical assessment of bi-exponential diffusion weighted imaging signal characteristics induced by intravoxel incoherent motion in malignant breast tumors

    PubMed Central

    Wong, Oi Lei; Lo, Gladys G.; Chan, Helen H. L.; Wong, Ting Ting; Cheung, Polly S. Y.

    2016-01-01

    Background The purpose of this study is to statistically assess whether bi-exponential intravoxel incoherent motion (IVIM) model better characterizes diffusion weighted imaging (DWI) signal of malignant breast tumor than mono-exponential Gaussian diffusion model. Methods 3 T DWI data of 29 malignant breast tumors were retrospectively included. Linear least-square mono-exponential fitting and segmented least-square bi-exponential fitting were used for apparent diffusion coefficient (ADC) and IVIM parameter quantification, respectively. F-test and Akaike Information Criterion (AIC) were used to statistically assess the preference of mono-exponential and bi-exponential model using region-of-interests (ROI)-averaged and voxel-wise analysis. Results For ROI-averaged analysis, 15 tumors were significantly better fitted by bi-exponential function and 14 tumors exhibited mono-exponential behavior. The calculated ADC, D (true diffusion coefficient) and f (pseudo-diffusion fraction) showed no significant differences between mono-exponential and bi-exponential preferable tumors. Voxel-wise analysis revealed that 27 tumors contained more voxels exhibiting mono-exponential DWI decay while only 2 tumors presented more bi-exponential decay voxels. ADC was consistently and significantly larger than D for both ROI-averaged and voxel-wise analysis. Conclusions Although the presence of IVIM effect in malignant breast tumors could be suggested, statistical assessment shows that bi-exponential fitting does not necessarily better represent the DWI signal decay in breast cancer under clinically typical acquisition protocol and signal-to-noise ratio (SNR). Our study indicates the importance to statistically examine the breast cancer DWI signal characteristics in practice. PMID:27709078

  12. An Investigation of the Performance of the Generalized S-X[superscript 2] Item-Fit Index for Polytomous IRT Models. ACT Research Report Series, 2007-1

    ERIC Educational Resources Information Center

    Kang, Taehoon; Chen, Troy T.

    2007-01-01

    Orlando and Thissen (2000, 2003) proposed an item-fit index, S-X[superscript 2], for dichotomous item response theory (IRT) models, which has performed better than traditional item-fit statistics such as Yen's (1981) Q[subscript 1] and McKinley and Mill's (1985) G[superscript 2]. This study extends the utility of S-X[superscript 2] to polytomous…

  13. Outlier Detection in High-Stakes Certification Testing. Research Report.

    ERIC Educational Resources Information Center

    Meijer, Rob R.

    Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…

  14. Snug as a Bug: Goodness of Fit and Quality of Models.

    PubMed

    Jupiter, Daniel C

    In elucidating risk factors, or attempting to make predictions about the behavior of subjects in our biomedical studies, we often build statistical models. These models are meant to capture some aspect of reality, or some real-world process underlying the phenomena we are examining. However, no model is perfect, and it is thus important to have tools to assess how accurate models are. In this commentary, we delve into the various roles that our models can play. Then we introduce the notion of the goodness of fit of models and lay the ground work for further study of diagnostic tests for assessing both the fidelity of our models and the statistical assumptions underlying them. Copyright © 2017 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  15. Apparent cosmic acceleration from Type Ia supernovae

    NASA Astrophysics Data System (ADS)

    Dam, Lawrence H.; Heinesen, Asta; Wiltshire, David L.

    2017-11-01

    Parameters that quantify the acceleration of cosmic expansion are conventionally determined within the standard Friedmann-Lemaître-Robertson-Walker (FLRW) model, which fixes spatial curvature to be homogeneous. Generic averages of Einstein's equations in inhomogeneous cosmology lead to models with non-rigidly evolving average spatial curvature, and different parametrizations of apparent cosmic acceleration. The timescape cosmology is a viable example of such a model without dark energy. Using the largest available supernova data set, the JLA catalogue, we find that the timescape model fits the luminosity distance-redshift data with a likelihood that is statistically indistinguishable from the standard spatially flat Λ cold dark matter cosmology by Bayesian comparison. In the timescape case cosmic acceleration is non-zero but has a marginal amplitude, with best-fitting apparent deceleration parameter, q_{0}=-0.043^{+0.004}_{-0.000}. Systematic issues regarding standardization of supernova light curves are analysed. Cuts of data at the statistical homogeneity scale affect light-curve parameter fits independent of cosmology. A cosmological model dependence of empirical changes to the mean colour parameter is also found. Irrespective of which model ultimately fits better, we argue that as a competitive model with a non-FLRW expansion history, the timescape model may prove a useful diagnostic tool for disentangling selection effects and astrophysical systematics from the underlying expansion history.

  16. Bright high z SnIa: A challenge for ΛCDM

    NASA Astrophysics Data System (ADS)

    Perivolaropoulos, L.; Shafieloo, A.

    2009-06-01

    It has recently been pointed out by Kowalski et. al. [Astrophys. J. 686, 749 (2008).ASJOAB0004-637X10.1086/589937] that there is “an unexpected brightness of the SnIa data at z>1.” We quantify this statement by constructing a new statistic which is applicable directly on the type Ia supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. It is based on binning the normalized differences between the SnIa distance moduli and the corresponding best fit values in the context of a specific cosmological model (e.g. ΛCDM). These differences are normalized by the standard errors of the observed distance moduli. We then focus on the highest redshift bin and extend its size toward lower redshifts until the binned normalized difference (BND) changes sign (crosses 0) at a redshift zc (bin size Nc). The bin size Nc of this crossing (the statistical variable) is then compared with the corresponding crossing bin size Nmc for Monte Carlo data realizations based on the best fit model. We find that the crossing bin size Nc obtained from the Union08 and Gold06 data with respect to the best fit ΛCDM model is anomalously large compared to Nmc of the corresponding Monte Carlo data sets obtained from the best fit ΛCDM in each case. In particular, only 2.2% of the Monte Carlo ΛCDM data sets are consistent with the Gold06 value of Nc while the corresponding probability for the Union08 value of Nc is 5.3%. Thus, according to this statistic, the probability that the high redshift brightness bias of the Union08 and Gold06 data sets is realized in the context of a (w0,w1)=(-1,0) model (ΛCDM cosmology) is less than 6%. The corresponding realization probability in the context of a (w0,w1)=(-1.4,2) model is more than 30% for both the Union08 and the Gold06 data sets indicating a much better consistency for this model with respect to the BND statistic.

  17. Statistical inference for noisy nonlinear ecological dynamic systems.

    PubMed

    Wood, Simon N

    2010-08-26

    Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.

  18. Assessment of corneal properties based on statistical modeling of OCT speckle

    PubMed Central

    Jesus, Danilo A.; Iskander, D. Robert

    2016-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409

  19. Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions

    ERIC Educational Resources Information Center

    Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.

    2005-01-01

    Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…

  20. A smoothed residual based goodness-of-fit statistic for nest-survival models

    Treesearch

    Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell

    2008-01-01

    Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...

  1. Conditional statistical inference with multistage testing designs.

    PubMed

    Zwitser, Robert J; Maris, Gunter

    2015-03-01

    In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

  2. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  3. A nonparametric smoothing method for assessing GEE models with longitudinal binary data.

    PubMed

    Lin, Kuo-Chin; Chen, Yi-Ju; Shyr, Yu

    2008-09-30

    Studies involving longitudinal binary responses are widely applied in the health and biomedical sciences research and frequently analyzed by generalized estimating equations (GEE) method. This article proposes an alternative goodness-of-fit test based on the nonparametric smoothing approach for assessing the adequacy of GEE fitted models, which can be regarded as an extension of the goodness-of-fit test of le Cessie and van Houwelingen (Biometrics 1991; 47:1267-1282). The expectation and approximate variance of the proposed test statistic are derived. The asymptotic distribution of the proposed test statistic in terms of a scaled chi-squared distribution and the power performance of the proposed test are discussed by simulation studies. The testing procedure is demonstrated by two real data. Copyright (c) 2008 John Wiley & Sons, Ltd.

  4. Response Surface Methodology Using a Fullest Balanced Model: A Re-Analysis of a Dataset in the Korean Journal for Food Science of Animal Resources.

    PubMed

    Rheem, Sungsue; Rheem, Insoo; Oh, Sejong

    2017-01-01

    Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. In the analysis of response surface data, a second-order polynomial regression model is usually used. However, sometimes we encounter situations where the fit of the second-order model is poor. If the model fitted to the data has a poor fit including a lack of fit, the modeling and optimization results might not be accurate. In such a case, using a fullest balanced model, which has no lack of fit, can fix such problem, enhancing the accuracy of the response surface modeling and optimization. This article presents how to develop and use such a model for the better modeling and optimizing of the response through an illustrative re-analysis of a dataset in Park et al. (2014) published in the Korean Journal for Food Science of Animal Resources .

  5. Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

    EPA Pesticide Factsheets

    The model performance evaluation consists of metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors.

  6. On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages

    PubMed Central

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W.; Xia, Yinglin; Tu, Xin M.

    2011-01-01

    Summary The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. PMID:21671252

  7. Potential pitfalls when denoising resting state fMRI data using nuisance regression.

    PubMed

    Bright, Molly G; Tench, Christopher R; Murphy, Kevin

    2017-07-01

    In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Estimating HIV-1 Fitness Characteristics from Cross-Sectional Genotype Data

    PubMed Central

    Gopalakrishnan, Sathej; Montazeri, Hesam; Menz, Stephan; Beerenwinkel, Niko; Huisinga, Wilhelm

    2014-01-01

    Despite the success of highly active antiretroviral therapy (HAART) in the management of human immunodeficiency virus (HIV)-1 infection, virological failure due to drug resistance development remains a major challenge. Resistant mutants display reduced drug susceptibilities, but in the absence of drug, they generally have a lower fitness than the wild type, owing to a mutation-incurred cost. The interaction between these fitness costs and drug resistance dictates the appearance of mutants and influences viral suppression and therapeutic success. Assessing in vivo viral fitness is a challenging task and yet one that has significant clinical relevance. Here, we present a new computational modelling approach for estimating viral fitness that relies on common sparse cross-sectional clinical data by combining statistical approaches to learn drug-specific mutational pathways and resistance factors with viral dynamics models to represent the host-virus interaction and actions of drug mechanistically. We estimate in vivo fitness characteristics of mutant genotypes for two antiretroviral drugs, the reverse transcriptase inhibitor zidovudine (ZDV) and the protease inhibitor indinavir (IDV). Well-known features of HIV-1 fitness landscapes are recovered, both in the absence and presence of drugs. We quantify the complex interplay between fitness costs and resistance by computing selective advantages for different mutants. Our approach extends naturally to multiple drugs and we illustrate this by simulating a dual therapy with ZDV and IDV to assess therapy failure. The combined statistical and dynamical modelling approach may help in dissecting the effects of fitness costs and resistance with the ultimate aim of assisting the choice of salvage therapies after treatment failure. PMID:25375675

  9. Limited-Information Goodness-of-Fit Testing of Diagnostic Classification Item Response Theory Models. CRESST Report 840

    ERIC Educational Resources Information Center

    Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

    2014-01-01

    It is a well-known problem in testing the fit of models to multinomial data that the full underlying contingency table will inevitably be sparse for tests of reasonable length and for realistic sample sizes. Under such conditions, full-information test statistics such as Pearson's X[superscript 2]?? and the likelihood ratio statistic…

  10. Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

    EPA Science Inventory

    This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit m...

  11. Examining the dimensional structure models of secondary traumatic stress based on DSM-5 symptoms.

    PubMed

    Mordeno, Imelu G; Go, Geraldine P; Yangson-Serondo, April

    2017-02-01

    Latent factor structure of Secondary Traumatic Stress (STS) has been examined using Diagnostic Statistic Manual-IV (DSM-IV)'s Posttraumatic Stress Disorder (PTSD) nomenclature. With the advent of Diagnostic Statistic Manual-5 (DSM-5), there is an impending need to reexamine STS using DSM-5 symptoms in light of the most updated PTSD models in the literature. The study investigated and determined the best fitted PTSD models using DSM-5 PTSD criteria symptoms. Confirmatory factor analysis (CFA) was conducted to examine model fit using the Secondary Traumatic Stress Scale in 241 registered and practicing Filipino nurses (166 females and 75 males) who worked in the Philippines and gave direct nursing services to patients. Based on multiple fit indices, the results showed the 7-factor hybrid model, comprising of intrusion, avoidance, negative affect, anhedonia, externalizing behavior, anxious arousal, and dysphoric arousal factors has excellent fit to STS. This model asserts that: (1) hyperarousal criterion needs to be divided into anxious and dysphoric arousal factors; (2) symptoms characterizing negative and positive affect need to be separated to two separate factors, and; (3) a new factor would categorize externalized, self-initiated impulse and control-deficit behaviors. Comparison of nested and non-nested models showed Hybrid model to have superior fit over other models. The specificity of the symptom structure of STS based on DSM-5 PTSD criteria suggests having more specific interventions addressing the more elaborate symptom-groupings that would alleviate the condition of nurses exposed to STS on a daily basis. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. The Thomas–Fermi quark model: Non-relativistic aspects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Quan, E-mail: quan_liu@baylor.edu; Wilcox, Walter, E-mail: walter_wilcox@baylor.edu

    The first numerical investigation of non-relativistic aspects of the Thomas–Fermi (TF) statistical multi-quark model is given. We begin with a review of the traditional TF model without an explicit spin interaction and find that the spin splittings are too small in this approach. An explicit spin interaction is then introduced which entails the definition of a generalized spin “flavor”. We investigate baryonic states in this approach which can be described with two inequivalent wave functions; such states can however apply to multiple degenerate flavors. We find that the model requires a spatial separation of quark flavors, even if completely degenerate.more » Although the TF model is designed to investigate the possibility of many-quark states, we find surprisingly that it may be used to fit the low energy spectrum of almost all ground state octet and decuplet baryons. The charge radii of such states are determined and compared with lattice calculations and other models. The low energy fit obtained allows us to extrapolate to the six-quark doubly strange H-dibaryon state, flavor symmetric strange states of higher quark content and possible six quark nucleon–nucleon resonances. The emphasis here is on the systematics revealed in this approach. We view our model as a versatile and convenient tool for quickly assessing the characteristics of new, possibly bound, particle states of higher quark number content. -- Highlights: • First application of the statistical Thomas–Fermi quark model to baryonic systems. • Novel aspects: spin as generalized flavor; spatial separation of quark flavor phases. • The model is statistical, but the low energy baryonic spectrum is successfully fit. • Numerical applications include the H-dibaryon, strange states and nucleon resonances. • The statistical point of view does not encourage the idea of bound many-quark baryons.« less

  13. Improving the Validity of Activity of Daily Living Dependency Risk Assessment

    PubMed Central

    Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.

    2015-01-01

    Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867

  14. A Generalized Form of Context-Dependent Psychophysiological Interactions (gPPI): A Comparison to Standard Approaches

    PubMed Central

    McLaren, Donald G.; Ries, Michele L.; Xu, Guofan; Johnson, Sterling C.

    2012-01-01

    Functional MRI (fMRI) allows one to study task-related regional responses and task-dependent connectivity analysis using psychophysiological interaction (PPI) methods. The latter affords the additional opportunity to understand how brain regions interact in a task-dependent manner. The current implementation of PPI in Statistical Parametric Mapping (SPM8) is configured primarily to assess connectivity differences between two task conditions, when in practice fMRI tasks frequently employ more than two conditions. Here we evaluate how a generalized form of context-dependent PPI (gPPI; http://www.nitrc.org/projects/gppi), which is configured to automatically accommodate more than two task conditions in the same PPI model by spanning the entire experimental space, compares to the standard implementation in SPM8. These comparisons are made using both simulations and an empirical dataset. In the simulated dataset, we compare the interaction beta estimates to their expected values and model fit using the Akaike Information Criterion (AIC). We found that interaction beta estimates in gPPI were robust to different simulated data models, were not different from the expected beta value, and had better model fits than when using standard PPI (sPPI) methods. In the empirical dataset, we compare the model fit of the gPPI approach to sPPI. We found that the gPPI approach improved model fit compared to sPPI. There were several regions that became non-significant with gPPI. These regions all showed significantly better model fits with gPPI. Also, there were several regions where task-dependent connectivity was only detected using gPPI methods, also with improved model fit. Regions that were detected with all methods had more similar model fits. These results suggest that gPPI may have greater sensitivity and specificity than standard implementation in SPM. This notion is tempered slightly as there is no gold standard; however, data simulations with a known outcome support our conclusions about gPPI. In sum, the generalized form of context-dependent PPI approach has increased flexibility of statistical modeling, and potentially improves model fit, specificity to true negative findings, and sensitivity to true positive findings. PMID:22484411

  15. Hyper-Fit: Fitting Linear Models to Multidimensional Data with Multivariate Gaussian Uncertainties

    NASA Astrophysics Data System (ADS)

    Robotham, A. S. G.; Obreschkow, D.

    2015-09-01

    Astronomical data is often uncertain with errors that are heteroscedastic (different for each data point) and covariant between different dimensions. Assuming that a set of D-dimensional data points can be described by a (D - 1)-dimensional plane with intrinsic scatter, we derive the general likelihood function to be maximised to recover the best fitting model. Alongside the mathematical description, we also release the hyper-fit package for the R statistical language (http://github.com/asgr/hyper.fit) and a user-friendly web interface for online fitting (http://hyperfit.icrar.org). The hyper-fit package offers access to a large number of fitting routines, includes visualisation tools, and is fully documented in an extensive user manual. Most of the hyper-fit functionality is accessible via the web interface. In this paper, we include applications to toy examples and to real astronomical data from the literature: the mass-size, Tully-Fisher, Fundamental Plane, and mass-spin-morphology relations. In most cases, the hyper-fit solutions are in good agreement with published values, but uncover more information regarding the fitted model.

  16. Modelling 1-minute directional observations of the global irradiance.

    NASA Astrophysics Data System (ADS)

    Thejll, Peter; Pagh Nielsen, Kristian; Andersen, Elsa; Furbo, Simon

    2016-04-01

    Direct and diffuse irradiances from the sky has been collected at 1-minute intervals for about a year from the experimental station at the Technical University of Denmark for the IEA project "Solar Resource Assessment and Forecasting". These data were gathered by pyrheliometers tracking the Sun, as well as with apertured pyranometers gathering 1/8th and 1/16th of the light from the sky in 45 degree azimuthal ranges pointed around the compass. The data are gathered in order to develop detailed models of the potentially available solar energy and its variations at high temporal resolution in order to gain a more detailed understanding of the solar resource. This is important for a better understanding of the sub-grid scale cloud variation that cannot be resolved with climate and weather models. It is also important for optimizing the operation of active solar energy systems such as photovoltaic plants and thermal solar collector arrays, and for passive solar energy and lighting to buildings. We present regression-based modelling of the observed data, and focus, here, on the statistical properties of the model fits. Using models based on the one hand on what is found in the literature and on physical expectations, and on the other hand on purely statistical models, we find solutions that can explain up to 90% of the variance in global radiation. The models leaning on physical insights include terms for the direct solar radiation, a term for the circum-solar radiation, a diffuse term and a term for the horizon brightening/darkening. The purely statistical model is found using data- and formula-validation approaches picking model expressions from a general catalogue of possible formulae. The method allows nesting of expressions, and the results found are dependent on and heavily constrained by the cross-validation carried out on statistically independent testing and training data-sets. Slightly better fits -- in terms of variance explained -- is found using the purely statistical fitting/searching approach. We describe the methods applied, results found, and discuss the different potentials of the physics- and statistics-only based model-searches.

  17. Forecasting mortality of road traffic injuries in China using seasonal autoregressive integrated moving average model.

    PubMed

    Zhang, Xujun; Pang, Yuanyuan; Cui, Mengjing; Stallones, Lorann; Xiang, Huiyun

    2015-02-01

    Road traffic injuries have become a major public health problem in China. This study aimed to develop statistical models for predicting road traffic deaths and to analyze seasonality of deaths in China. A seasonal autoregressive integrated moving average (SARIMA) model was used to fit the data from 2000 to 2011. Akaike Information Criterion, Bayesian Information Criterion, and mean absolute percentage error were used to evaluate the constructed models. Autocorrelation function and partial autocorrelation function of residuals and Ljung-Box test were used to compare the goodness-of-fit between the different models. The SARIMA model was used to forecast monthly road traffic deaths in 2012. The seasonal pattern of road traffic mortality data was statistically significant in China. SARIMA (1, 1, 1) (0, 1, 1)12 model was the best fitting model among various candidate models; the Akaike Information Criterion, Bayesian Information Criterion, and mean absolute percentage error were -483.679, -475.053, and 4.937, respectively. Goodness-of-fit testing showed nonautocorrelations in the residuals of the model (Ljung-Box test, Q = 4.86, P = .993). The fitted deaths using the SARIMA (1, 1, 1) (0, 1, 1)12 model for years 2000 to 2011 closely followed the observed number of road traffic deaths for the same years. The predicted and observed deaths were also very close for 2012. This study suggests that accurate forecasting of road traffic death incidence is possible using SARIMA model. The SARIMA model applied to historical road traffic deaths data could provide important evidence of burden of road traffic injuries in China. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Measuring Microaggression and Organizational Climate Factors in Military Units

    DTIC Science & Technology

    2011-04-01

    i.e., items) to accurately assess what we intend for them to measure. To assess construct and convergent validity, the author assessed the statistical ...sample indicated both convergent and construct validity of the microaggression scale. Table 5 presents these statistics . Measuring Microaggressions...models. As shown in Table 7, the measurement models had acceptable fit indices. That is, the Chi-square statistics were at their minimum; although the

  19. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  20. Verification of relationship model between Korean new elderly class's recovery resilience and productive aging.

    PubMed

    Cho, Gun-Sang; Kim, Dae-Sung; Yi, Eun-Surk

    2015-12-01

    The purpose of this study is to verification of relationship model between Korean new elderly class's recovery resilience and productive aging. As of 2013, this study sampled preliminary elderly people in Gyeonggi-do and other provinces nationwide. Data from a total of effective 484 subjects was analyzed. The collected data was processed using the IBM SPSS 20.0 and AMOS 20.0, and underwent descriptive statistical analysis, confirmatory factor analysis, and structure model verification. The path coefficient associated with model fitness was examined. The standardization path coefficient between recovery resilience and productive aging is β=0.975 (t=14.790), revealing a statistically significant positive effect. Thus, it was found that the proposed basic model on the direct path of recovery resilience and productive aging was fit for the model.

  1. Verification of relationship model between Korean new elderly class’s recovery resilience and productive aging

    PubMed Central

    Cho, Gun-Sang; Kim, Dae-Sung; Yi, Eun-Surk

    2015-01-01

    The purpose of this study is to verification of relationship model between Korean new elderly class’s recovery resilience and productive aging. As of 2013, this study sampled preliminary elderly people in Gyeonggi-do and other provinces nationwide. Data from a total of effective 484 subjects was analyzed. The collected data was processed using the IBM SPSS 20.0 and AMOS 20.0, and underwent descriptive statistical analysis, confirmatory factor analysis, and structure model verification. The path coefficient associated with model fitness was examined. The standardization path coefficient between recovery resilience and productive aging is β=0.975 (t=14.790), revealing a statistically significant positive effect. Thus, it was found that the proposed basic model on the direct path of recovery resilience and productive aging was fit for the model. PMID:26730383

  2. NuSTAR AND Swift Observations of the Very High State in GX 339-4: Weighing the Black Hole With X-Rays

    NASA Technical Reports Server (NTRS)

    Parker, M. L.; Tomsick, J. A.; Kennea, J. A.; Miller, J. M.; Harrison, F. A.; Barret, D.; Boggs, S. E.; Christensen, F. E.; Craig, W. W.; Fabian, A. C.; hide

    2016-01-01

    We present results from spectral fitting of the very high state of GX339-4 with Nuclear Spectroscopic Telescope Array (NuSTAR) and Swift. We use relativistic reflection modeling to measure the spin of the black hole and inclination of the inner disk and find a spin of a = 0.95+0.08/-0.02 and inclination of 30deg +/- 1deg (statistical errors). These values agree well with previous results from reflection modeling. With the exceptional sensitivity of NuSTAR at the high-energy side of the disk spectrum, we are able to constrain multiple physical parameters simultaneously using continuum fitting. By using the constraints from reflection as input for the continuum fitting method, we invert the conventional fitting procedure to estimate the mass and distance of GX 339-4 using just the X-ray spectrum, finding a mass of 9.0+1.6/-1.2 Stellar Mass and distance of 8.4 +/- 0.9 kpc (statistical errors).

  3. STATISTICAL STUDY of HARD X-RAY SPECTRAL CHARACTERISTICS OF SOLAR FLARES

    NASA Astrophysics Data System (ADS)

    Alaoui, M.; Krucker, S.; Saint-Hilaire, P.; Lin, R. P.

    2009-12-01

    We investigate the spectral characteristics of 75 solar flares at the hard X-ray peak time observed by RHESSI (Ramaty High Energy Solar Spectroscopic Imager) in the energy range 12-150keV. At energies above 40keV, the Hard X-ray emission is mostly produced by bremsstrahlung of suprathermal electrons as they interact with the ambient plasma in the chromosphere. The observed photon spectra therefore provide diagnostics of electron acceleration processes in Solar flares. We will present statistical results of spectral fitting using two models: a broken power law plus a thermal component which is a direct fit of the photon spectrum and a thick target model plus a thermal component which is a fit of the photon spectra with assumptions on the electrons emitting bremsstrahlung in the thick target approximation.

  4. Potential energy surface fitting by a statistically localized, permutationally invariant, local interpolating moving least squares method for the many-body potential: Method and application to N{sub 4}

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bender, Jason D.; Doraiswamy, Sriram; Candler, Graham V., E-mail: truhlar@umn.edu, E-mail: candler@aem.umn.edu

    2014-02-07

    Fitting potential energy surfaces to analytic forms is an important first step for efficient molecular dynamics simulations. Here, we present an improved version of the local interpolating moving least squares method (L-IMLS) for such fitting. Our method has three key improvements. First, pairwise interactions are modeled separately from many-body interactions. Second, permutational invariance is incorporated in the basis functions, using permutationally invariant polynomials in Morse variables, and in the weight functions. Third, computational cost is reduced by statistical localization, in which we statistically correlate the cutoff radius with data point density. We motivate our discussion in this paper with amore » review of global and local least-squares-based fitting methods in one dimension. Then, we develop our method in six dimensions, and we note that it allows the analytic evaluation of gradients, a feature that is important for molecular dynamics. The approach, which we call statistically localized, permutationally invariant, local interpolating moving least squares fitting of the many-body potential (SL-PI-L-IMLS-MP, or, more simply, L-IMLS-G2), is used to fit a potential energy surface to an electronic structure dataset for N{sub 4}. We discuss its performance on the dataset and give directions for further research, including applications to trajectory calculations.« less

  5. NLINEAR - NONLINEAR CURVE FITTING PROGRAM

    NASA Technical Reports Server (NTRS)

    Everhart, J. L.

    1994-01-01

    A common method for fitting data is a least-squares fit. In the least-squares method, a user-specified fitting function is utilized in such a way as to minimize the sum of the squares of distances between the data points and the fitting curve. The Nonlinear Curve Fitting Program, NLINEAR, is an interactive curve fitting routine based on a description of the quadratic expansion of the chi-squared statistic. NLINEAR utilizes a nonlinear optimization algorithm that calculates the best statistically weighted values of the parameters of the fitting function and the chi-square that is to be minimized. The inputs to the program are the mathematical form of the fitting function and the initial values of the parameters to be estimated. This approach provides the user with statistical information such as goodness of fit and estimated values of parameters that produce the highest degree of correlation between the experimental data and the mathematical model. In the mathematical formulation of the algorithm, the Taylor expansion of chi-square is first introduced, and justification for retaining only the first term are presented. From the expansion, a set of n simultaneous linear equations are derived, which are solved by matrix algebra. To achieve convergence, the algorithm requires meaningful initial estimates for the parameters of the fitting function. NLINEAR is written in Fortran 77 for execution on a CDC Cyber 750 under NOS 2.3. It has a central memory requirement of 5K 60 bit words. Optionally, graphical output of the fitting function can be plotted. Tektronix PLOT-10 routines are required for graphics. NLINEAR was developed in 1987.

  6. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  7. A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Travis, James E.

    2013-01-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.

  8. On fitting generalized linear mixed-effects models for binary responses using different statistical packages.

    PubMed

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W; Xia, Yinglin; Zhu, Liang; Tu, Xin M

    2011-09-10

    The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. Copyright © 2011 John Wiley & Sons, Ltd.

  9. Statistical Modeling for Radiation Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, Raymond L.

    2014-01-01

    We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.

  10. A Note on Recurring Misconceptions When Fitting Nonlinear Mixed Models.

    PubMed

    Harring, Jeffrey R; Blozis, Shelley A

    2016-01-01

    Nonlinear mixed-effects (NLME) models are used when analyzing continuous repeated measures data taken on each of a number of individuals where the focus is on characteristics of complex, nonlinear individual change. Challenges with fitting NLME models and interpreting analytic results have been well documented in the statistical literature. However, parameter estimates as well as fitted functions from NLME analyses in recent articles have been misinterpreted, suggesting the need for clarification of these issues before these misconceptions become fact. These misconceptions arise from the choice of popular estimation algorithms, namely, the first-order linearization method (FO) and Gaussian-Hermite quadrature (GHQ) methods, and how these choices necessarily lead to population-average (PA) or subject-specific (SS) interpretations of model parameters, respectively. These estimation approaches also affect the fitted function for the typical individual, the lack-of-fit of individuals' predicted trajectories, and vice versa.

  11. Quantifying Confidence in Model Predictions for Hypersonic Aircraft Structures

    DTIC Science & Technology

    2015-03-01

    of isolating calibrations of models in the network, segmented and simultaneous calibration are compared using the Kullback - Leibler ...value of θ. While not all test -statistics are as simple as measuring goodness or badness of fit , their directional interpretations tend to remain...data quite well, qualitatively. Quantitative goodness - of - fit tests are problematic because they assume a true empirical CDF is being tested or

  12. Fragment size distribution statistics in dynamic fragmentation of laser shock-loaded tin

    NASA Astrophysics Data System (ADS)

    He, Weihua; Xin, Jianting; Zhao, Yongqiang; Chu, Genbai; Xi, Tao; Shui, Min; Lu, Feng; Gu, Yuqiu

    2017-06-01

    This work investigates the geometric statistics method to characterize the size distribution of tin fragments produced in the laser shock-loaded dynamic fragmentation process. In the shock experiments, the ejection of the tin sample with etched V-shape groove in the free surface are collected by the soft recovery technique. Subsequently, the produced fragments are automatically detected with the fine post-shot analysis techniques including the X-ray micro-tomography and the improved watershed method. To characterize the size distributions of the fragments, a theoretical random geometric statistics model based on Poisson mixtures is derived for dynamic heterogeneous fragmentation problem, which reveals linear combinational exponential distribution. The experimental data related to fragment size distributions of the laser shock-loaded tin sample are examined with the proposed theoretical model, and its fitting performance is compared with that of other state-of-the-art fragment size distribution models. The comparison results prove that our proposed model can provide far more reasonable fitting result for the laser shock-loaded tin.

  13. On use of the multistage dose-response model for assessing laboratory animal carcinogenicity

    PubMed Central

    Nitcheva, Daniella; Piegorsch, Walter W.; West, R. Webster

    2007-01-01

    We explore how well a statistical multistage model describes dose-response patterns in laboratory animal carcinogenicity experiments from a large database of quantal response data. The data are collected from the U.S. EPA’s publicly available IRIS data warehouse and examined statistically to determine how often higher-order values in the multistage predictor yield significant improvements in explanatory power over lower-order values. Our results suggest that the addition of a second-order parameter to the model only improves the fit about 20% of the time, while adding even higher-order terms apparently does not contribute to the fit at all, at least with the study designs we captured in the IRIS database. Also included is an examination of statistical tests for assessing significance of higher-order terms in a multistage dose-response model. It is noted that bootstrap testing methodology appears to offer greater stability for performing the hypothesis tests than a more-common, but possibly unstable, “Wald” test. PMID:17490794

  14. A goodness-of-fit test for occupancy models with correlated within-season revisits

    USGS Publications Warehouse

    Wright, Wilson; Irvine, Kathryn M.; Rodhouse, Thomas J.

    2016-01-01

    Occupancy modeling is important for exploring species distribution patterns and for conservation monitoring. Within this framework, explicit attention is given to species detection probabilities estimated from replicate surveys to sample units. A central assumption is that replicate surveys are independent Bernoulli trials, but this assumption becomes untenable when ecologists serially deploy remote cameras and acoustic recording devices over days and weeks to survey rare and elusive animals. Proposed solutions involve modifying the detection-level component of the model (e.g., first-order Markov covariate). Evaluating whether a model sufficiently accounts for correlation is imperative, but clear guidance for practitioners is lacking. Currently, an omnibus goodnessof- fit test using a chi-square discrepancy measure on unique detection histories is available for occupancy models (MacKenzie and Bailey, Journal of Agricultural, Biological, and Environmental Statistics, 9, 2004, 300; hereafter, MacKenzie– Bailey test). We propose a join count summary measure adapted from spatial statistics to directly assess correlation after fitting a model. We motivate our work with a dataset of multinight bat call recordings from a pilot study for the North American Bat Monitoring Program. We found in simulations that our join count test was more reliable than the MacKenzie–Bailey test for detecting inadequacy of a model that assumed independence, particularly when serial correlation was low to moderate. A model that included a Markov-structured detection-level covariate produced unbiased occupancy estimates except in the presence of strong serial correlation and a revisit design consisting only of temporal replicates. When applied to two common bat species, our approach illustrates that sophisticated models do not guarantee adequate fit to real data, underscoring the importance of model assessment. Our join count test provides a widely applicable goodness-of-fit test and specifically evaluates occupancy model lack of fit related to correlation among detections within a sample unit. Our diagnostic tool is available for practitioners that serially deploy survey equipment as a way to achieve cost savings.

  15. Testing the Performance and Accuracy of the RELXILL Model for the Relativistic X-Ray Reflection from Accretion Disks

    NASA Astrophysics Data System (ADS)

    Choudhury, Kishalay; García, Javier A.; Steiner, James F.; Bambi, Cosimo

    2017-12-01

    The reflection spectroscopic model RELXILL is commonly implemented in studying relativistic X-ray reflection from accretion disks around black holes. We present a systematic study of the model’s capability to constrain the dimensionless spin and ionization parameters from ∼6000 Nuclear Spectroscopic Telescope Array (NuSTAR) simulations of a bright X-ray source employing the lamp-post geometry. We employ high-count spectra to show the limitations in the model without being confused with limitations in signal-to-noise. We find that both parameters are well-recovered at 90% confidence with improving constraints at higher reflection fraction, high spin, and low source height. We test spectra across a broad range—first at 106–107 and then ∼105 total source counts across the effective 3–79 keV band of NuSTAR, and discover a strong dependence of the results on how fits are performed around the starting parameters, owing to the complexity of the model itself. A blind fit chosen over an approach that carries some estimates of the actual parameter values can lead to significantly worse recovery of model parameters. We further stress the importance to span the space of nonlinear-behaving parameters like {log} ξ carefully and thoroughly for the model to avoid misleading results. In light of selecting fitting procedures, we recall the necessity to pay attention to the choice of data binning and fit statistics used to test the goodness of fit by demonstrating the effect on the photon index Γ. We re-emphasize and implore the need to account for the detector resolution while binning X-ray data and using Poisson fit statistics instead while analyzing Poissonian data.

  16. RooStatsCms: A tool for analysis modelling, combination and statistical studies

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Schott, G.; Quast, G.

    2010-04-01

    RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.

  17. Understanding the relationship between duration of untreated psychosis and outcomes: A statistical perspective.

    PubMed

    Hannigan, Ailish; Bargary, Norma; Kinsella, Anthony; Clarke, Mary

    2017-06-14

    Although the relationships between duration of untreated psychosis (DUP) and outcomes are often assumed to be linear, few studies have explored the functional form of these relationships. The aim of this study is to demonstrate the potential of recent advances in curve fitting approaches (splines) to explore the form of the relationship between DUP and global assessment of functioning (GAF). Curve fitting approaches were used in models to predict change in GAF at long-term follow-up using DUP for a sample of 83 individuals with schizophrenia. The form of the relationship between DUP and GAF was non-linear. Accounting for non-linearity increased the percentage of variance in GAF explained by the model, resulting in better prediction and understanding of the relationship. The relationship between DUP and outcomes may be complex and model fit may be improved by accounting for the form of the relationship. This should be routinely assessed and new statistical approaches for non-linear relationships exploited, if appropriate. © 2017 John Wiley & Sons Australia, Ltd.

  18. Adaptation in Tunably Rugged Fitness Landscapes: The Rough Mount Fuji Model

    PubMed Central

    Neidhart, Johannes; Szendro, Ivan G.; Krug, Joachim

    2014-01-01

    Much of the current theory of adaptation is based on Gillespie’s mutational landscape model (MLM), which assumes that the fitness values of genotypes linked by single mutational steps are independent random variables. On the other hand, a growing body of empirical evidence shows that real fitness landscapes, while possessing a considerable amount of ruggedness, are smoother than predicted by the MLM. In the present article we propose and analyze a simple fitness landscape model with tunable ruggedness based on the rough Mount Fuji (RMF) model originally introduced by Aita et al. in the context of protein evolution. We provide a comprehensive collection of results pertaining to the topographical structure of RMF landscapes, including explicit formulas for the expected number of local fitness maxima, the location of the global peak, and the fitness correlation function. The statistics of single and multiple adaptive steps on the RMF landscape are explored mainly through simulations, and the results are compared to the known behavior in the MLM model. Finally, we show that the RMF model can explain the large number of second-step mutations observed on a highly fit first-step background in a recent evolution experiment with a microvirid bacteriophage. PMID:25123507

  19. Confirmatory Factor Analysis of the Malay Version of the Confusion, Hubbub and Order Scale (CHAOS-6) among Myocardial Infarction Survivors in a Malaysian Cardiac Healthcare Facility.

    PubMed

    Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul

    2017-08-01

    The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version's validity. Composite reliability was tested to determine the scale's reliability. All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6) , for the Malaysian population.

  20. Confirmatory Factor Analysis of the Malay Version of the Confusion, Hubbub and Order Scale (CHAOS-6) among Myocardial Infarction Survivors in a Malaysian Cardiac Healthcare Facility

    PubMed Central

    Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul

    2017-01-01

    Background The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. Methods The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version’s validity. Composite reliability was tested to determine the scale’s reliability. Results All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. Conclusion The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6), for the Malaysian population. PMID:28951688

  1. Unifying distance-based goodness-of-fit indicators for hydrologic model assessment

    NASA Astrophysics Data System (ADS)

    Cheng, Qinbo; Reinhardt-Imjela, Christian; Chen, Xi; Schulte, Achim

    2014-05-01

    The goodness-of-fit indicator, i.e. efficiency criterion, is very important for model calibration. However, recently the knowledge about the goodness-of-fit indicators is all empirical and lacks a theoretical support. Based on the likelihood theory, a unified distance-based goodness-of-fit indicator termed BC-GED model is proposed, which uses the Box-Cox (BC) transformation to remove the heteroscedasticity of model errors and the generalized error distribution (GED) with zero-mean to fit the distribution of model errors after BC. The BC-GED model can unify all recent distance-based goodness-of-fit indicators, and reveals the mean square error (MSE) and the mean absolute error (MAE) that are widely used goodness-of-fit indicators imply statistic assumptions that the model errors follow the Gaussian distribution and the Laplace distribution with zero-mean, respectively. The empirical knowledge about goodness-of-fit indicators can be also easily interpreted by BC-GED model, e.g. the sensitivity to high flow of the goodness-of-fit indicators with large power of model errors results from the low probability of large model error in the assumed distribution of these indicators. In order to assess the effect of the parameters (i.e. the BC transformation parameter λ and the GED kurtosis coefficient β also termed the power of model errors) of BC-GED model on hydrologic model calibration, six cases of BC-GED model were applied in Baocun watershed (East China) with SWAT-WB-VSA model. Comparison of the inferred model parameters and model simulation results among the six indicators demonstrates these indicators can be clearly separated two classes by the GED kurtosis β: β >1 and β ≤ 1. SWAT-WB-VSA calibrated by the class β >1 of distance-based goodness-of-fit indicators captures high flow very well and mimics the baseflow very badly, but it calibrated by the class β ≤ 1 mimics the baseflow very well, because first the larger value of β, the greater emphasis is put on high flow and second the derivative of GED probability density function at zero is zero as β >1, but discontinuous as β ≤ 1, and even infinity as β < 1 with which the maximum likelihood estimation can guarantee the model errors approach zero as well as possible. The BC-GED that estimates the parameters (i.e. λ and β) of BC-GED model as well as hydrologic model parameters is the best distance-based goodness-of-fit indicator because not only the model validation using groundwater levels is very good, but also the model errors fulfill the statistic assumption best. However, in some cases of model calibration with a few observations e.g. calibration of single-event model, for avoiding estimation of the parameters of BC-GED model the MAE i.e. the boundary indicator (β = 1) of the two classes, can replace the BC-GED, because the model validation of MAE is best.

  2. Data Analysis & Statistical Methods for Command File Errors

    NASA Technical Reports Server (NTRS)

    Meshkat, Leila; Waggoner, Bruce; Bryant, Larry

    2014-01-01

    This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.

  3. Improved Forecasting Methods for Naval Manpower Studies

    DTIC Science & Technology

    2015-03-25

    Using monthly data is likely to improve the overall fit of the models and the accuracy of the BP test . A measure of unemployment to control for...measure of the relative goodness of fit of a statistical model. It is grounded in the concept of information entropy, in effect, offering a relative...the Kullback – Leibler divergence, DKL(f,g1); similarly, the information lost from using g2 to

  4. Water quality management using statistical analysis and time-series prediction model

    NASA Astrophysics Data System (ADS)

    Parmar, Kulwinder Singh; Bhardwaj, Rashmi

    2014-12-01

    This paper deals with water quality management using statistical analysis and time-series prediction model. The monthly variation of water quality standards has been used to compare statistical mean, median, mode, standard deviation, kurtosis, skewness, coefficient of variation at Yamuna River. Model validated using R-squared, root mean square error, mean absolute percentage error, maximum absolute percentage error, mean absolute error, maximum absolute error, normalized Bayesian information criterion, Ljung-Box analysis, predicted value and confidence limits. Using auto regressive integrated moving average model, future water quality parameters values have been estimated. It is observed that predictive model is useful at 95 % confidence limits and curve is platykurtic for potential of hydrogen (pH), free ammonia, total Kjeldahl nitrogen, dissolved oxygen, water temperature (WT); leptokurtic for chemical oxygen demand, biochemical oxygen demand. Also, it is observed that predicted series is close to the original series which provides a perfect fit. All parameters except pH and WT cross the prescribed limits of the World Health Organization /United States Environmental Protection Agency, and thus water is not fit for drinking, agriculture and industrial use.

  5. A Performance Comparison on the Probability Plot Correlation Coefficient Test using Several Plotting Positions for GEV Distribution.

    NASA Astrophysics Data System (ADS)

    Ahn, Hyunjun; Jung, Younghun; Om, Ju-Seong; Heo, Jun-Haeng

    2014-05-01

    It is very important to select the probability distribution in Statistical hydrology. Goodness of fit test is a statistical method that selects an appropriate probability model for a given data. The probability plot correlation coefficient (PPCC) test as one of the goodness of fit tests was originally developed for normal distribution. Since then, this test has been widely applied to other probability models. The PPCC test is known as one of the best goodness of fit test because it shows higher rejection powers among them. In this study, we focus on the PPCC tests for the GEV distribution which is widely used in the world. For the GEV model, several plotting position formulas are suggested. However, the PPCC statistics are derived only for the plotting position formulas (Goel and De, In-na and Nguyen, and Kim et al.) in which the skewness coefficient (or shape parameter) are included. And then the regression equations are derived as a function of the shape parameter and sample size for a given significance level. In addition, the rejection powers of these formulas are compared using Monte-Carlo simulation. Keywords: Goodness-of-fit test, Probability plot correlation coefficient test, Plotting position, Monte-Carlo Simulation ACKNOWLEDGEMENTS This research was supported by a grant 'Establishing Active Disaster Management System of Flood Control Structures by using 3D BIM Technique' [NEMA-12-NH-57] from the Natural Hazard Mitigation Research Group, National Emergency Management Agency of Korea.

  6. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  7. IMFIT: A FAST, FLEXIBLE NEW PROGRAM FOR ASTRONOMICAL IMAGE FITTING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erwin, Peter; Universitäts-Sternwarte München, Scheinerstrasse 1, D-81679 München

    2015-02-01

    I describe a new, open-source astronomical image-fitting program called IMFIT, specialized for galaxies but potentially useful for other sources, which is fast, flexible, and highly extensible. A key characteristic of the program is an object-oriented design that allows new types of image components (two-dimensional surface-brightness functions) to be easily written and added to the program. Image functions provided with IMFIT include the usual suspects for galaxy decompositions (Sérsic, exponential, Gaussian), along with Core-Sérsic and broken-exponential profiles, elliptical rings, and three components that perform line-of-sight integration through three-dimensional luminosity-density models of disks and rings seen at arbitrary inclinations. Available minimization algorithmsmore » include Levenberg-Marquardt, Nelder-Mead simplex, and Differential Evolution, allowing trade-offs between speed and decreased sensitivity to local minima in the fit landscape. Minimization can be done using the standard χ{sup 2} statistic (using either data or model values to estimate per-pixel Gaussian errors, or else user-supplied error images) or Poisson-based maximum-likelihood statistics; the latter approach is particularly appropriate for cases of Poisson data in the low-count regime. I show that fitting low-signal-to-noise ratio galaxy images using χ{sup 2} minimization and individual-pixel Gaussian uncertainties can lead to significant biases in fitted parameter values, which are avoided if a Poisson-based statistic is used; this is true even when Gaussian read noise is present.« less

  8. Can spatial statistical river temperature models be transferred between catchments?

    NASA Astrophysics Data System (ADS)

    Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.

    2017-09-01

    There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across multiple catchments and larger spatial scales.

  9. Introducing the fit-criteria assessment plot - A visualisation tool to assist class enumeration in group-based trajectory modelling.

    PubMed

    Klijn, Sven L; Weijenberg, Matty P; Lemmens, Paul; van den Brandt, Piet A; Lima Passos, Valéria

    2017-10-01

    Background and objective Group-based trajectory modelling is a model-based clustering technique applied for the identification of latent patterns of temporal changes. Despite its manifold applications in clinical and health sciences, potential problems of the model selection procedure are often overlooked. The choice of the number of latent trajectories (class-enumeration), for instance, is to a large degree based on statistical criteria that are not fail-safe. Moreover, the process as a whole is not transparent. To facilitate class enumeration, we introduce a graphical summary display of several fit and model adequacy criteria, the fit-criteria assessment plot. Methods An R-code that accepts universal data input is presented. The programme condenses relevant group-based trajectory modelling output information of model fit indices in automated graphical displays. Examples based on real and simulated data are provided to illustrate, assess and validate fit-criteria assessment plot's utility. Results Fit-criteria assessment plot provides an overview of fit criteria on a single page, placing users in an informed position to make a decision. Fit-criteria assessment plot does not automatically select the most appropriate model but eases the model assessment procedure. Conclusions Fit-criteria assessment plot is an exploratory, visualisation tool that can be employed to assist decisions in the initial and decisive phase of group-based trajectory modelling analysis. Considering group-based trajectory modelling's widespread resonance in medical and epidemiological sciences, a more comprehensive, easily interpretable and transparent display of the iterative process of class enumeration may foster group-based trajectory modelling's adequate use.

  10. Permutation tests for goodness-of-fit testing of mathematical models to experimental data.

    PubMed

    Fişek, M Hamit; Barlas, Zeynep

    2013-03-01

    This paper presents statistical procedures for improving the goodness-of-fit testing of theoretical models to data obtained from laboratory experiments. We use an experimental study in the expectation states research tradition which has been carried out in the "standardized experimental situation" associated with the program to illustrate the application of our procedures. We briefly review the expectation states research program and the fundamentals of resampling statistics as we develop our procedures in the resampling context. The first procedure we develop is a modification of the chi-square test which has been the primary statistical tool for assessing goodness of fit in the EST research program, but has problems associated with its use. We discuss these problems and suggest a procedure to overcome them. The second procedure we present, the "Average Absolute Deviation" test, is a new test and is proposed as an alternative to the chi square test, as being simpler and more informative. The third and fourth procedures are permutation versions of Jonckheere's test for ordered alternatives, and Kendall's tau(b), a rank order correlation coefficient. The fifth procedure is a new rank order goodness-of-fit test, which we call the "Deviation from Ideal Ranking" index, which we believe may be more useful than other rank order tests for assessing goodness-of-fit of models to experimental data. The application of these procedures to the sample data is illustrated in detail. We then present another laboratory study from an experimental paradigm different from the expectation states paradigm - the "network exchange" paradigm, and describe how our procedures may be applied to this data set. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Impact of systematic uncertainties for the CP violation measurement in superbeam experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meloni, Davide

    We present a three-flavour fit to the recent ν{sub µ} → ν{sub e} T2K oscillation data with different models for the neutrino-nucleus cross section. We show that, even for a limited statistics, the allowed regions and best fit points in the (θ{sub 13}, δ{sub CP}) plane are affected if, instead of using the Fermi Gas model to describe the quasielastic cross section, we employ a model including the multinucleon emission channel [1].

  12. Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

    PubMed

    Yin, Guosheng; Ma, Yanyuan

    2013-01-01

    The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.

  13. Performance of DIMTEST-and NOHARM-Based Statistics for Testing Unidimensionality

    ERIC Educational Resources Information Center

    Finch, Holmes; Habing, Brian

    2007-01-01

    This Monte Carlo study compares the ability of the parametric bootstrap version of DIMTEST with three goodness-of-fit tests calculated from a fitted NOHARM model to detect violations of the assumption of unidimensionality in testing data. The effectiveness of the procedures was evaluated for different numbers of items, numbers of examinees,…

  14. Effect of heating rate and kinetic model selection on activation energy of nonisothermal crystallization of amorphous felodipine.

    PubMed

    Chattoraj, Sayantan; Bhugra, Chandan; Li, Zheng Jane; Sun, Changquan Calvin

    2014-12-01

    The nonisothermal crystallization kinetics of amorphous materials is routinely analyzed by statistically fitting the crystallization data to kinetic models. In this work, we systematically evaluate how the model-dependent crystallization kinetics is impacted by variations in the heating rate and the selection of the kinetic model, two key factors that can lead to significant differences in the crystallization activation energy (Ea ) of an amorphous material. Using amorphous felodipine, we show that the Ea decreases with increase in the heating rate, irrespective of the kinetic model evaluated in this work. The model that best describes the crystallization phenomenon cannot be identified readily through the statistical fitting approach because several kinetic models yield comparable R(2) . Here, we propose an alternate paired model-fitting model-free (PMFMF) approach for identifying the most suitable kinetic model, where Ea obtained from model-dependent kinetics is compared with those obtained from model-free kinetics. The most suitable kinetic model is identified as the one that yields Ea values comparable with the model-free kinetics. Through this PMFMF approach, nucleation and growth is identified as the main mechanism that controls the crystallization kinetics of felodipine. Using this PMFMF approach, we further demonstrate that crystallization mechanism from amorphous phase varies with heating rate. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.

  15. The landscape of W± and Z bosons produced in pp collisions up to LHC energies

    NASA Astrophysics Data System (ADS)

    Basso, Eduardo; Bourrely, Claude; Pasechnik, Roman; Soffer, Jacques

    2017-10-01

    We consider a selection of recent experimental results on electroweak W± , Z gauge boson production in pp collisions at BNL RHIC and CERN LHC energies in comparison to prediction of perturbative QCD calculations based on different sets of NLO parton distribution functions including the statistical PDF model known from fits to the DIS data. We show that the current statistical PDF parametrization (fitted to the DIS data only) underestimates the LHC data on W± , Z gauge boson production cross sections at the NLO by about 20%. This suggests that there is a need to refit the parameters of the statistical PDF including the latest LHC data.

  16. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

    PubMed

    Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy

    2017-10-01

    Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Searching for hidden unexpected features in the SnIa data

    NASA Astrophysics Data System (ADS)

    Shafieloo, A.; Perivolaropoulos, L.

    2010-06-01

    It is known that κ2 statistic and likelihood analysis may not be sensitive to the all features of the data. Despite of the fact that by using κ2 statistic we can measure the overall goodness of fit for a model confronted to a data set, some specific features of the data can stay undetectable. For instance, it has been pointed out that there is an unexpected brightness of the SnIa data at z > 1 in the Union compilation. We quantify this statement by constructing a new statistic, called Binned Normalized Difference (BND) statistic, which is applicable directly on the Type Ia Supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. According to this statistic there are 2.2%, 5.3% and 12.6% consistency between the Gold06, Union08 and Constitution09 data and spatially flat ΛCDM model when the real data is compared with many realizations of the simulated monte carlo datasets. The corresponding realization probability in the context of a (w0,w1) = (-1.4,2) model is more than 30% for all mentioned datasets indicating a much better consistency for this model with respect to the BND statistic. The unexpected high z brightness of SnIa can be interpreted either as a trend towards more deceleration at high z than expected in the context of ΛCDM or as a statistical fluctuation or finally as a systematic effect perhaps due to a mild SnIa evolution at high z.

  18. Time series modeling and forecasting using memetic algorithms for regime-switching models.

    PubMed

    Bergmeir, Christoph; Triguero, Isaac; Molina, Daniel; Aznarte, José Luis; Benitez, José Manuel

    2012-11-01

    In this brief, we present a novel model fitting procedure for the neuro-coefficient smooth transition autoregressive model (NCSTAR), as presented by Medeiros and Veiga. The model is endowed with a statistically founded iterative building procedure and can be interpreted in terms of fuzzy rule-based systems. The interpretability of the generated models and a mathematically sound building procedure are two very important properties of forecasting models. The model fitting procedure employed by the original NCSTAR is a combination of initial parameter estimation by a grid search procedure with a traditional local search algorithm. We propose a different fitting procedure, using a memetic algorithm, in order to obtain more accurate models. An empirical evaluation of the method is performed, applying it to various real-world time series originating from three forecasting competitions. The results indicate that we can significantly enhance the accuracy of the models, making them competitive to models commonly used in the field.

  19. The epistemological status of general circulation models

    NASA Astrophysics Data System (ADS)

    Loehle, Craig

    2018-03-01

    Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.

  20. Clinical risk stratification model for advanced colorectal neoplasia in persons with negative fecal immunochemical test results.

    PubMed

    Jung, Yoon Suk; Park, Chan Hyuk; Kim, Nam Hee; Park, Jung Ho; Park, Dong Il; Sohn, Chong Il

    2018-01-01

    The fecal immunochemical test (FIT) has low sensitivity for detecting advanced colorectal neoplasia (ACRN); thus, a considerable portion of FIT-negative persons may have ACRN. We aimed to develop a risk-scoring model for predicting ACRN in FIT-negative persons. We reviewed the records of participants aged ≥40 years who underwent a colonoscopy and FIT during a health check-up. We developed a risk-scoring model for predicting ACRN in FIT-negative persons. Of 11,873 FIT-negative participants, 255 (2.1%) had ACRN. On the basis of the multivariable logistic regression model, point scores were assigned as follows among FIT-negative persons: age (per year from 40 years old), 1 point; current smoker, 10 points; overweight, 5 points; obese, 7 points; hypertension, 6 points; old cerebrovascular attack (CVA), 15 points. Although the proportion of ACRN in FIT-negative persons increased as risk scores increased (from 0.6% in the group with 0-4 points to 8.1% in the group with 35-39 points), it was significantly lower than that in FIT-positive persons (14.9%). However, there was no statistical difference between the proportion of ACRN in FIT-negative persons with ≥40 points and in FIT-positive persons (10.5% vs. 14.9%, P = 0.321). FIT-negative persons may need to undergo screening colonoscopy if they clinically have a high risk of ACRN. The scoring model based on age, smoking habits, overweight or obesity, hypertension, and old CVA may be useful in selecting and prioritizing FIT-negative persons for screening colonoscopy.

  1. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  2. Mathematical and Statistical Software Index.

    DTIC Science & Technology

    1986-08-01

    geometric) mean HMEAN - harmonic mean MEDIAN - median MODE - mode QUANT - quantiles OGIVE - distribution curve IQRNG - interpercentile range RANGE ... range mutliphase pivoting algorithm cross-classification multiple discriminant analysis cross-tabul ation mul tipl e-objecti ve model curve fitting...Statistics). .. .. .... ...... ..... ...... ..... .. 21 *RANGEX (Correct Correlations for Curtailment of Range ). .. .. .... ...... ... 21 *RUMMAGE II (Analysis

  3. Exploring a Three-Level Model of Calibration Accuracy

    ERIC Educational Resources Information Center

    Schraw, Gregory; Kuch, Fred; Gutierrez, Antonio P.; Richmond, Aaron S.

    2014-01-01

    We compared 5 different statistics (i.e., G index, gamma, "d'", sensitivity, specificity) used in the social sciences and medical diagnosis literatures to assess calibration accuracy in order to examine the relationship among them and to explore whether one statistic provided a best fitting general measure of accuracy. College…

  4. FIT: statistical modeling tool for transcriptome dynamics under fluctuating field conditions

    PubMed Central

    Iwayama, Koji; Aisaka, Yuri; Kutsuna, Natsumaro

    2017-01-01

    Abstract Motivation: Considerable attention has been given to the quantification of environmental effects on organisms. In natural conditions, environmental factors are continuously changing in a complex manner. To reveal the effects of such environmental variations on organisms, transcriptome data in field environments have been collected and analyzed. Nagano et al. proposed a model that describes the relationship between transcriptomic variation and environmental conditions and demonstrated the capability to predict transcriptome variation in rice plants. However, the computational cost of parameter optimization has prevented its wide application. Results: We propose a new statistical model and efficient parameter optimization based on the previous study. We developed and released FIT, an R package that offers functions for parameter optimization and transcriptome prediction. The proposed method achieves comparable or better prediction performance within a shorter computational time than the previous method. The package will facilitate the study of the environmental effects on transcriptomic variation in field conditions. Availability and Implementation: Freely available from CRAN (https://cran.r-project.org/web/packages/FIT/). Contact: anagano@agr.ryukoku.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online PMID:28158396

  5. Nonlinear Structured Growth Mixture Models in Mplus and OpenMx

    PubMed Central

    Grimm, Kevin J.; Ram, Nilam; Estabrook, Ryne

    2014-01-01

    Growth mixture models (GMMs; Muthén & Muthén, 2000; Muthén & Shedden, 1999) are a combination of latent curve models (LCMs) and finite mixture models to examine the existence of latent classes that follow distinct developmental patterns. GMMs are often fit with linear, latent basis, multiphase, or polynomial change models because of their common use, flexibility in modeling many types of change patterns, the availability of statistical programs to fit such models, and the ease of programming. In this paper, we present additional ways of modeling nonlinear change patterns with GMMs. Specifically, we show how LCMs that follow specific nonlinear functions can be extended to examine the presence of multiple latent classes using the Mplus and OpenMx computer programs. These models are fit to longitudinal reading data from the Early Childhood Longitudinal Study-Kindergarten Cohort to illustrate their use. PMID:25419006

  6. Exact goodness-of-fit tests for Markov chains.

    PubMed

    Besag, J; Mondal, D

    2013-06-01

    Goodness-of-fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ² asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness-of-fit tests for first- and higher order Markov chains, with particular attention given to time-reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948-1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four-state DNA sequence and lends support to the original conclusion that a second-order Markov chain provides an adequate fit to the data. The last one is six-state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first-order reversible Markov chain at 6 picosecond time steps. © 2013, The International Biometric Society.

  7. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  8. The effect of noise-induced variance on parameter recovery from reaction times.

    PubMed

    Vadillo, Miguel A; Garaizar, Pablo

    2016-03-31

    Technical noise can compromise the precision and accuracy of the reaction times collected in psychological experiments, especially in the case of Internet-based studies. Although this noise seems to have only a small impact on traditional statistical analyses, its effects on model fit to reaction-time distributions remains unexplored. Across four simulations we study the impact of technical noise on parameter recovery from data generated from an ex-Gaussian distribution and from a Ratcliff Diffusion Model. Our results suggest that the impact of noise-induced variance tends to be limited to specific parameters and conditions. Although we encourage researchers to adopt all measures to reduce the impact of noise on reaction-time experiments, we conclude that the typical amount of noise-induced variance found in these experiments does not pose substantial problems for statistical analyses based on model fitting.

  9. Single photon counting linear mode avalanche photodiode technologies

    NASA Astrophysics Data System (ADS)

    Williams, George M.; Huntington, Andrew S.

    2011-10-01

    The false count rate of a single-photon-sensitive photoreceiver consisting of a high-gain, low-excess-noise linear-mode InGaAs avalanche photodiode (APD) and a high-bandwidth transimpedance amplifier (TIA) is fit to a statistical model. The peak height distribution of the APD's multiplied dark current is approximated by the weighted sum of McIntyre distributions, each characterizing dark current generated at a different location within the APD's junction. The peak height distribution approximated in this way is convolved with a Gaussian distribution representing the input-referred noise of the TIA to generate the statistical distribution of the uncorrelated sum. The cumulative distribution function (CDF) representing count probability as a function of detection threshold is computed, and the CDF model fit to empirical false count data. It is found that only k=0 McIntyre distributions fit the empirically measured CDF at high detection threshold, and that false count rate drops faster than photon count rate as detection threshold is raised. Once fit to empirical false count data, the model predicts the improvement of the false count rate to be expected from reductions in TIA noise and APD dark current. Improvement by at least three orders of magnitude is thought feasible with further manufacturing development and a capacitive-feedback TIA (CTIA).

  10. Statistical alignment: computational properties, homology testing and goodness-of-fit.

    PubMed

    Hein, J; Wiuf, C; Knudsen, B; Møller, M B; Wibling, G

    2000-09-08

    The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.

  11. [How to fit and interpret multilevel models using SPSS].

    PubMed

    Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael

    2007-05-01

    Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.

  12. Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems.

    PubMed

    Williams, Richard A; Timmis, Jon; Qwarnstrom, Eva E

    2016-01-01

    Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model.

  13. Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems

    PubMed Central

    Timmis, Jon; Qwarnstrom, Eva E.

    2016-01-01

    Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model. PMID:27571414

  14. Best Statistical Distribution of flood variables for Johor River in Malaysia

    NASA Astrophysics Data System (ADS)

    Salarpour Goodarzi, M.; Yusop, Z.; Yusof, F.

    2012-12-01

    A complex flood event is always characterized by a few characteristics such as flood peak, flood volume, and flood duration, which might be mutually correlated. This study explored the statistical distribution of peakflow, flood duration and flood volume at Rantau Panjang gauging station on the Johor River in Malaysia. Hourly data were recorded for 45 years. The data were analysed based on water year (July - June). Five distributions namely, Log Normal, Generalize Pareto, Log Pearson, Normal and Generalize Extreme Value (GEV) were used to model the distribution of all the three variables. Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalize Pareto distribution is found to be the most suitable model when tested with the Anderson-Darling test and the, Kolmogorov-Smirnov suggested that GEV is the best for peakflow. The result of this research can be used to improve flood frequency analysis. Comparison between Generalized Extreme Value, Generalized Pareto and Log Pearson distributions in the Cumulative Distribution Function of peakflow

  15. Training models of anatomic shape variability

    PubMed Central

    Merck, Derek; Tracton, Gregg; Saboo, Rohit; Levy, Joshua; Chaney, Edward; Pizer, Stephen; Joshi, Sarang

    2008-01-01

    Learning probability distributions of the shape of anatomic structures requires fitting shape representations to human expert segmentations from training sets of medical images. The quality of statistical segmentation and registration methods is directly related to the quality of this initial shape fitting, yet the subject is largely overlooked or described in an ad hoc way. This article presents a set of general principles to guide such training. Our novel method is to jointly estimate both the best geometric model for any given image and the shape distribution for the entire population of training images by iteratively relaxing purely geometric constraints in favor of the converging shape probabilities as the fitted objects converge to their target segmentations. The geometric constraints are carefully crafted both to obtain legal, nonself-interpenetrating shapes and to impose the model-to-model correspondences required for useful statistical analysis. The paper closes with example applications of the method to synthetic and real patient CT image sets, including same patient male pelvis and head and neck images, and cross patient kidney and brain images. Finally, we outline how this shape training serves as the basis for our approach to IGRT∕ART. PMID:18777919

  16. Fisher's geometrical model emerges as a property of complex integrated phenotypic networks.

    PubMed

    Martin, Guillaume

    2014-05-01

    Models relating phenotype space to fitness (phenotype-fitness landscapes) have seen important developments recently. They can roughly be divided into mechanistic models (e.g., metabolic networks) and more heuristic models like Fisher's geometrical model. Each has its own drawbacks, but both yield testable predictions on how the context (genomic background or environment) affects the distribution of mutation effects on fitness and thus adaptation. Both have received some empirical validation. This article aims at bridging the gap between these approaches. A derivation of the Fisher model "from first principles" is proposed, where the basic assumptions emerge from a more general model, inspired by mechanistic networks. I start from a general phenotypic network relating unspecified phenotypic traits and fitness. A limited set of qualitative assumptions is then imposed, mostly corresponding to known features of phenotypic networks: a large set of traits is pleiotropically affected by mutations and determines a much smaller set of traits under optimizing selection. Otherwise, the model remains fairly general regarding the phenotypic processes involved or the distribution of mutation effects affecting the network. A statistical treatment and a local approximation close to a fitness optimum yield a landscape that is effectively the isotropic Fisher model or its extension with a single dominant phenotypic direction. The fit of the resulting alternative distributions is illustrated in an empirical data set. These results bear implications on the validity of Fisher's model's assumptions and on which features of mutation fitness effects may vary (or not) across genomic or environmental contexts.

  17. Kinetic model for microbial growth and desulphurisation with Enterobacter sp.

    PubMed

    Liu, Long; Guo, Zhiguo; Lu, Jianjiang; Xu, Xiaolin

    2015-02-01

    Biodesulphurisation was investigated by using Enterobacter sp. D4, which can selectively desulphurise and convert dibenzothiophene into 2-hydroxybiphenyl (2-HBP). The experimental values of growth, substrate consumption and product generation were obtained at 95 % confidence level of the fitted values using three models: Hinshelwood equation, Luedeking-Piret and Luedeking-Piret-like equations. The average error values between experimental values and fitted values were less than 10 %. These kinetic models describe all the experimental data with good statistical parameters. The production of 2-HBP in Enterobacter sp. was by "coupled growth".

  18. A Novel Statistical Analysis and Interpretation of Flow Cytometry Data

    DTIC Science & Technology

    2013-07-05

    ing parameters of the mathematical model are determined by using least squares to fit the data in Figures 1 and 2 (see Section 4). The second method is...variance (for all k and j). Then the AIC, which is the expected value of the relative Kullback – Leibler distance for a given model [16], is Kn = n log ( J...emphasized that the fit of the model is quite good for both donors and cell types. As such, we proceed to analyse the dynamic responsiveness of the

  19. Investigating the cross-cultural validity of DSM-5 autism spectrum disorder: evidence from Finnish and UK samples.

    PubMed

    Mandy, William; Charman, Tony; Puura, Kaija; Skuse, David

    2014-01-01

    The recent Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5) reformulation of autism spectrum disorder has received empirical support from North American and UK samples. Autism spectrum disorder is an increasingly global diagnosis, and research is needed to discover how well it generalises beyond North America and the United Kingdom. We tested the applicability of the DSM-5 model to a sample of Finnish young people with autism spectrum disorder (n = 130) or the broader autism phenotype (n = 110). Confirmatory factor analysis tested the DSM-5 model in Finland and compared the fit of this model between Finnish and UK participants (autism spectrum disorder, n = 488; broader autism phenotype, n = 220). In both countries, autistic symptoms were measured using the Developmental, Diagnostic and Dimensional Interview. Replicating findings from English-speaking samples, the DSM-5 model fitted well in Finnish autism spectrum disorder participants, outperforming a Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV) model. The DSM-5 model fitted equally well in Finnish and UK autism spectrum disorder samples. Among broader autism phenotype participants, this model fitted well in the United Kingdom but poorly in Finland, suggesting that cross-cultural variability may be greatest for milder autistic characteristics. We encourage researchers with data from other cultures to emulate our methodological approach, to map any cultural variability in the manifestation of autism spectrum disorder and the broader autism phenotype. This would be especially valuable given the ongoing revision of the International Classification of Diseases-11th Edition, the most global of the diagnostic manuals.

  20. Modeling the structure of the attitudes and belief scale 2 using CFA and bifactor approaches: Toward the development of an abbreviated version.

    PubMed

    Hyland, Philip; Shevlin, Mark; Adamson, Gary; Boduszek, Daniel

    2014-01-01

    The Attitudes and Belief Scale-2 (ABS-2: DiGiuseppe, Leaf, Exner, & Robin, 1988. The development of a measure of rational/irrational thinking. Paper presented at the World Congress of Behavior Therapy, Edinburg, Scotland.) is a 72-item self-report measure of evaluative rational and irrational beliefs widely used in Rational Emotive Behavior Therapy research contexts. However, little psychometric evidence exists regarding the measure's underlying factor structure. Furthermore, given the length of the ABS-2 there is a need for an abbreviated version that can be administered when there are time demands on the researcher, such as in clinical settings. This study sought to examine a series of theoretical models hypothesized to represent the latent structure of the ABS-2 within an alternative models framework using traditional confirmatory factor analysis as well as utilizing a bifactor modeling approach. Furthermore, this study also sought to develop a psychometrically sound abbreviated version of the ABS-2. Three hundred and thirteen (N = 313) active emergency service personnel completed the ABS-2. Results indicated that for each model, the application of bifactor modeling procedures improved model fit statistics, and a novel eight-factor intercorrelated solution was identified as the best fitting model of the ABS-2. However, the observed fit indices failed to satisfy commonly accepted standards. A 24-item abbreviated version was thus constructed and an intercorrelated eight-factor solution yielded satisfactory model fit statistics. Current results support the use of a bifactor modeling approach to determining the factor structure of the ABS-2. Furthermore, results provide empirical support for the psychometric properties of the newly developed abbreviated version.

  1. Fitting statistical distributions to sea duck count data: implications for survey design and abundance estimation

    USGS Publications Warehouse

    Zipkin, Elise F.; Leirness, Jeffery B.; Kinlan, Brian P.; O'Connell, Allan F.; Silverman, Emily D.

    2014-01-01

    Determining appropriate statistical distributions for modeling animal count data is important for accurate estimation of abundance, distribution, and trends. In the case of sea ducks along the U.S. Atlantic coast, managers want to estimate local and regional abundance to detect and track population declines, to define areas of high and low use, and to predict the impact of future habitat change on populations. In this paper, we used a modified marked point process to model survey data that recorded flock sizes of Common eiders, Long-tailed ducks, and Black, Surf, and White-winged scoters. The data come from an experimental aerial survey, conducted by the United States Fish & Wildlife Service (USFWS) Division of Migratory Bird Management, during which east-west transects were flown along the Atlantic Coast from Maine to Florida during the winters of 2009–2011. To model the number of flocks per transect (the points), we compared the fit of four statistical distributions (zero-inflated Poisson, zero-inflated geometric, zero-inflated negative binomial and negative binomial) to data on the number of species-specific sea duck flocks that were recorded for each transect flown. To model the flock sizes (the marks), we compared the fit of flock size data for each species to seven statistical distributions: positive Poisson, positive negative binomial, positive geometric, logarithmic, discretized lognormal, zeta and Yule–Simon. Akaike’s Information Criterion and Vuong’s closeness tests indicated that the negative binomial and discretized lognormal were the best distributions for all species for the points and marks, respectively. These findings have important implications for estimating sea duck abundances as the discretized lognormal is a more skewed distribution than the Poisson and negative binomial, which are frequently used to model avian counts; the lognormal is also less heavy-tailed than the power law distributions (e.g., zeta and Yule–Simon), which are becoming increasingly popular for group size modeling. Choosing appropriate statistical distributions for modeling flock size data is fundamental to accurately estimating population summaries, determining required survey effort, and assessing and propagating uncertainty through decision-making processes.

  2. Football Statistics

    ERIC Educational Resources Information Center

    Jackson, Paul R.

    1972-01-01

    The probabilities of certain English football teams winning different playoffs are determined. In each case, a mathematical model is fitted to the observed data, assumptions are verified, and the calculations performed. (LS)

  3. Linear fitting of multi-threshold counting data with a pixel-array detector for spectral X-ray imaging

    PubMed Central

    Muir, Ryan D.; Pogranichney, Nicholas R.; Muir, J. Lewis; Sullivan, Shane Z.; Battaile, Kevin P.; Mulichak, Anne M.; Toth, Scott J.; Keefe, Lisa J.; Simpson, Garth J.

    2014-01-01

    Experiments and modeling are described to perform spectral fitting of multi-threshold counting measurements on a pixel-array detector. An analytical model was developed for describing the probability density function of detected voltage in X-ray photon-counting arrays, utilizing fractional photon counting to account for edge/corner effects from voltage plumes that spread across multiple pixels. Each pixel was mathematically calibrated by fitting the detected voltage distributions to the model at both 13.5 keV and 15.0 keV X-ray energies. The model and established pixel responses were then exploited to statistically recover images of X-ray intensity as a function of X-ray energy in a simulated multi-wavelength and multi-counting threshold experiment. PMID:25178010

  4. Linear fitting of multi-threshold counting data with a pixel-array detector for spectral X-ray imaging.

    PubMed

    Muir, Ryan D; Pogranichney, Nicholas R; Muir, J Lewis; Sullivan, Shane Z; Battaile, Kevin P; Mulichak, Anne M; Toth, Scott J; Keefe, Lisa J; Simpson, Garth J

    2014-09-01

    Experiments and modeling are described to perform spectral fitting of multi-threshold counting measurements on a pixel-array detector. An analytical model was developed for describing the probability density function of detected voltage in X-ray photon-counting arrays, utilizing fractional photon counting to account for edge/corner effects from voltage plumes that spread across multiple pixels. Each pixel was mathematically calibrated by fitting the detected voltage distributions to the model at both 13.5 keV and 15.0 keV X-ray energies. The model and established pixel responses were then exploited to statistically recover images of X-ray intensity as a function of X-ray energy in a simulated multi-wavelength and multi-counting threshold experiment.

  5. Pecan nutshell as biosorbent to remove Cu(II), Mn(II) and Pb(II) from aqueous solutions.

    PubMed

    Vaghetti, Julio C P; Lima, Eder C; Royer, Betina; da Cunha, Bruna M; Cardoso, Natali F; Brasil, Jorge L; Dias, Silvio L P

    2009-02-15

    In the present study we reported for the first time the feasibility of pecan nutshell (PNS, Carya illinoensis) as an alternative biosorbent to remove Cu(II), Mn(II) and Pb(II) metallic ions from aqueous solutions. The ability of PNS to remove the metallic ions was investigated by using batch biosorption procedure. The effects such as, pH, biosorbent dosage on the adsorption capacities of PNS were studied. Four kinetic models were tested, being the adsorption kinetics better fitted to fractionary-order kinetic model. Besides that, the kinetic data were also fitted to intra-particle diffusion model, presenting three linear regions, indicating that the kinetics of adsorption should follow multiple sorption rates. The equilibrium data were fitted to Langmuir, Freundlich, Sips and Redlich-Peterson isotherm models. Taking into account a statistical error function, the data were best fitted to Sips isotherm model. The maximum biosorption capacities of PNS were 1.35, 1.78 and 0.946mmolg(-1) for Cu(II), Mn(II) and Pb(II), respectively.

  6. The Birth-Death-Mutation Process: A New Paradigm for Fat Tailed Distributions

    PubMed Central

    Maruvka, Yosef E.; Kessler, David A.; Shnerb, Nadav M.

    2011-01-01

    Fat tailed statistics and power-laws are ubiquitous in many complex systems. Usually the appearance of of a few anomalously successful individuals (bio-species, investors, websites) is interpreted as reflecting some inherent “quality” (fitness, talent, giftedness) as in Darwin's theory of natural selection. Here we adopt the opposite, “neutral”, outlook, suggesting that the main factor explaining success is merely luck. The statistics emerging from the neutral birth-death-mutation (BDM) process is shown to fit marvelously many empirical distributions. While previous neutral theories have focused on the power-law tail, our theory economically and accurately explains the entire distribution. We thus suggest the BDM distribution as a standard neutral model: effects of fitness and selection are to be identified by substantial deviations from it. PMID:22069453

  7. Methods of comparing associative models and an application to retrospective revaluation.

    PubMed

    Witnauer, James E; Hutchings, Ryan; Miller, Ralph R

    2017-11-01

    Contemporary theories of associative learning are increasingly complex, which necessitates the use of computational methods to reveal predictions of these models. We argue that comparisons across multiple models in terms of goodness of fit to empirical data from experiments often reveal more about the actual mechanisms of learning and behavior than do simulations of only a single model. Such comparisons are best made when the values of free parameters are discovered through some optimization procedure based on the specific data being fit (e.g., hill climbing), so that the comparisons hinge on the psychological mechanisms assumed by each model rather than being biased by using parameters that differ in quality across models with respect to the data being fit. Statistics like the Bayesian information criterion facilitate comparisons among models that have different numbers of free parameters. These issues are examined using retrospective revaluation data. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. The Population Tracking Model: A Simple, Scalable Statistical Model for Neural Population Data

    PubMed Central

    O'Donnell, Cian; alves, J. Tiago Gonç; Whiteley, Nick; Portera-Cailliau, Carlos; Sejnowski, Terrence J.

    2017-01-01

    Our understanding of neural population coding has been limited by a lack of analysis methods to characterize spiking data from large populations. The biggest challenge comes from the fact that the number of possible network activity patterns scales exponentially with the number of neurons recorded (∼2Neurons). Here we introduce a new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time. The model works by matching the population rate (the number of neurons synchronously active) and the probability that each individual neuron fires given the population rate. We found that this model can accurately fit synthetic data from up to 1000 neurons. We also found that the model could rapidly decode visual stimuli from neural population data from macaque primary visual cortex about 65 ms after stimulus onset. Finally, we used the model to estimate the entropy of neural population activity in developing mouse somatosensory cortex and, surprisingly, found that it first increases, and then decreases during development. This statistical model opens new options for interrogating neural population data and can bolster the use of modern large-scale in vivo Ca2+ and voltage imaging tools. PMID:27870612

  9. The Balance-Scale Task Revisited: A Comparison of Statistical Models for Rule-Based and Information-Integration Theories of Proportional Reasoning

    PubMed Central

    Hofman, Abe D.; Visser, Ingmar; Jansen, Brenda R. J.; van der Maas, Han L. J.

    2015-01-01

    We propose and test three statistical models for the analysis of children’s responses to the balance scale task, a seminal task to study proportional reasoning. We use a latent class modelling approach to formulate a rule-based latent class model (RB LCM) following from a rule-based perspective on proportional reasoning and a new statistical model, the Weighted Sum Model, following from an information-integration approach. Moreover, a hybrid LCM using item covariates is proposed, combining aspects of both a rule-based and information-integration perspective. These models are applied to two different datasets, a standard paper-and-pencil test dataset (N = 779), and a dataset collected within an online learning environment that included direct feedback, time-pressure, and a reward system (N = 808). For the paper-and-pencil dataset the RB LCM resulted in the best fit, whereas for the online dataset the hybrid LCM provided the best fit. The standard paper-and-pencil dataset yielded more evidence for distinct solution rules than the online data set in which quantitative item characteristics are more prominent in determining responses. These results shed new light on the discussion on sequential rule-based and information-integration perspectives of cognitive development. PMID:26505905

  10. Comparison of dark energy models: A perspective from the latest observational data

    NASA Astrophysics Data System (ADS)

    Li, Miao; Li, Xiaodong; Zhang, Xin

    2010-09-01

    We compare some popular dark energy models under the assumption of a flat universe by using the latest observational data including the type Ia supernovae Constitution compilation, the baryon acoustic oscillation measurement from the Sloan Digital Sky Survey, the cosmic microwave background measurement given by the seven-year Wilkinson Microwave Anisotropy Probe observations and the determination of H 0 from the Hubble Space Telescope. Model comparison statistics such as the Bayesian and Akaike information criteria are applied to assess the worth of the models. These statistics favor models that give a good fit with fewer parameters. Based on this analysis, we find that the simplest cosmological constant model that has only one free parameter is still preferred by the current data. For other dynamical dark energy models, we find that some of them, such as the α dark energy, constant w, generalized Chaplygin gas, Chevalliear-Polarski-Linder parametrization, and holographic dark energy models, can provide good fits to the current data, and three of them, namely, the Ricci dark energy, agegraphic dark energy, and Dvali-Gabadadze-Porrati models, are clearly disfavored by the data.

  11. Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) task

    NASA Astrophysics Data System (ADS)

    Halbrügge, Marc

    2010-12-01

    This paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.

  12. A Stochastic Model of Space-Time Variability of Mesoscale Rainfall: Statistics of Spatial Averages

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, Thomas L.

    2003-01-01

    A characteristic feature of rainfall statistics is that they depend on the space and time scales over which rain data are averaged. A previously developed spectral model of rain statistics that is designed to capture this property, predicts power law scaling behavior for the second moment statistics of area-averaged rain rate on the averaging length scale L as L right arrow 0. In the present work a more efficient method of estimating the model parameters is presented, and used to fit the model to the statistics of area-averaged rain rate derived from gridded radar precipitation data from TOGA COARE. Statistical properties of the data and the model predictions are compared over a wide range of averaging scales. An extension of the spectral model scaling relations to describe the dependence of the average fraction of grid boxes within an area containing nonzero rain (the "rainy area fraction") on the grid scale L is also explored.

  13. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  14. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  15. Impact of cleaning and other interventions on the reduction of hospital-acquired Clostridium difficile infections in two hospitals in England assessed using a breakpoint model.

    PubMed

    Hughes, G J; Nickerson, E; Enoch, D A; Ahluwalia, J; Wilkinson, C; Ayers, R; Brown, N M

    2013-07-01

    Clostridium difficile infection remains a major challenge for hospitals. Although targeted infection control initiatives have been shown to be effective in reducing the incidence of hospital-acquired C. difficile infection, there is little evidence available to assess the effectiveness of specific interventions. To use statistical modelling to detect substantial reductions in the incidence of C. difficile from time series data from two hospitals in England, and relate these time points to infection control interventions. A statistical breakpoints model was fitted to likely hospital-acquired C. difficile infection incidence data from a teaching hospital (2002-2009) and a district general hospital (2005-2009) in England. Models with increasing complexity (i.e. increasing the number of breakpoints) were tested for an improved fit to the data. Partitions estimated from breakpoint models were tested for individual stability using statistical process control charts. Major infection control interventions from both hospitals during this time were grouped according to their primary target (antibiotics, cleaning, isolation, other) and mapped to the model-suggested breakpoints. For both hospitals, breakpoints coincided with enhancements to cleaning protocols. Statistical models enabled formal assessment of the impact of different interventions, and showed that enhancements to deep cleaning programmes are the interventions that have most likely led to substantial reductions in hospital-acquired C. difficile infections at the two hospitals studied. Copyright © 2013 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  16. Quantifying and Reducing Curve-Fitting Uncertainty in Isc

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campanelli, Mark; Duck, Benjamin; Emery, Keith

    2015-06-14

    Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less

  17. Quantifying and Reducing Curve-Fitting Uncertainty in Isc: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campanelli, Mark; Duck, Benjamin; Emery, Keith

    Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less

  18. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

    PubMed

    Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

    2017-03-01

    Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.

  19. Statistical Models for Tornado Climatology: Long and Short-Term Views.

    PubMed

    Elsner, James B; Jagger, Thomas H; Fricker, Tyler

    2016-01-01

    This paper estimates regional tornado risk from records of past events using statistical models. First, a spatial model is fit to the tornado counts aggregated in counties with terms that control for changes in observational practices over time. Results provide a long-term view of risk that delineates the main tornado corridors in the United States where the expected annual rate exceeds two tornadoes per 10,000 square km. A few counties in the Texas Panhandle and central Kansas have annual rates that exceed four tornadoes per 10,000 square km. Refitting the model after removing the least damaging tornadoes from the data (EF0) produces a similar map but with the greatest tornado risk shifted south and eastward. Second, a space-time model is fit to the counts aggregated in raster cells with terms that control for changes in climate factors. Results provide a short-term view of risk. The short-term view identifies a shift of tornado activity away from the Ohio Valley under El Niño conditions and away from the Southeast under positive North Atlantic oscillation conditions. The combined predictor effects on the local rates is quantified by fitting the model after leaving out the year to be predicted from the data. The models provide state-of-the-art views of tornado risk that can be used by government agencies, the insurance industry, and the general public.

  20. Statistical Models for Tornado Climatology: Long and Short-Term Views

    PubMed Central

    Jagger, Thomas H.; Fricker, Tyler

    2016-01-01

    This paper estimates regional tornado risk from records of past events using statistical models. First, a spatial model is fit to the tornado counts aggregated in counties with terms that control for changes in observational practices over time. Results provide a long-term view of risk that delineates the main tornado corridors in the United States where the expected annual rate exceeds two tornadoes per 10,000 square km. A few counties in the Texas Panhandle and central Kansas have annual rates that exceed four tornadoes per 10,000 square km. Refitting the model after removing the least damaging tornadoes from the data (EF0) produces a similar map but with the greatest tornado risk shifted south and eastward. Second, a space-time model is fit to the counts aggregated in raster cells with terms that control for changes in climate factors. Results provide a short-term view of risk. The short-term view identifies a shift of tornado activity away from the Ohio Valley under El Niño conditions and away from the Southeast under positive North Atlantic oscillation conditions. The combined predictor effects on the local rates is quantified by fitting the model after leaving out the year to be predicted from the data. The models provide state-of-the-art views of tornado risk that can be used by government agencies, the insurance industry, and the general public. PMID:27875581

  1. Statistical Inference in Hidden Markov Models Using k-Segment Constraints

    PubMed Central

    Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher

    2016-01-01

    Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674

  2. On Using Surrogates with Genetic Programming.

    PubMed

    Hildebrandt, Torsten; Branke, Jürgen

    2015-01-01

    One way to accelerate evolutionary algorithms with expensive fitness evaluations is to combine them with surrogate models. Surrogate models are efficiently computable approximations of the fitness function, derived by means of statistical or machine learning techniques from samples of fully evaluated solutions. But these models usually require a numerical representation, and therefore cannot be used with the tree representation of genetic programming (GP). In this paper, we present a new way to use surrogate models with GP. Rather than using the genotype directly as input to the surrogate model, we propose using a phenotypic characterization. This phenotypic characterization can be computed efficiently and allows us to define approximate measures of equivalence and similarity. Using a stochastic, dynamic job shop scenario as an example of simulation-based GP with an expensive fitness evaluation, we show how these ideas can be used to construct surrogate models and improve the convergence speed and solution quality of GP.

  3. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    PubMed

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

  4. Person Fit Based on Statistical Process Control in an Adaptive Testing Environment. Research Report 98-13.

    ERIC Educational Resources Information Center

    van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

    Person-fit research in the context of paper-and-pencil tests is reviewed, and some specific problems regarding person fit in the context of computerized adaptive testing (CAT) are discussed. Some new methods are proposed to investigate person fit in a CAT environment. These statistics are based on Statistical Process Control (SPC) theory. A…

  5. Baseline Estimation and Outlier Identification for Halocarbons

    NASA Astrophysics Data System (ADS)

    Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

    2017-12-01

    The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.

  6. IDENTIFICATION OF REGIME SHIFTS IN TIME SERIES USING NEIGHBORHOOD STATISTICS

    EPA Science Inventory

    The identification of alternative dynamic regimes in ecological systems requires several lines of evidence. Previous work on time series analysis of dynamic regimes includes mainly model-fitting methods. We introduce two methods that do not use models. These approaches use state-...

  7. Estimating procedure times for surgeries by determining location parameters for the lognormal model.

    PubMed

    Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

    2004-05-01

    We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

  8. Validation of the Brazilian version of the 'Spanish Burnout Inventory' in teachers.

    PubMed

    Gil-Monte, Pedro R; Carlotto, Mary Sandra; Câmara, Sheila Gonçalves

    2010-02-01

    To assess factorial validity and internal consistency of the Brazilian version of the 'Spanish Burnout Inventory' (SBI). The translation process of the SBI into Brazilian Portuguese included translation, back translation, and semantic equivalence. A confirmatory factor analysis was carried out using a four-factor model, which was similar to the original SBI. The sample consisted of 714 teachers working in schools in the metropolitan area of the city of Porto Alegre, Southern Brazil, in 2008. The instrument comprises 20 items and four subscales: Enthusiasm towards job (5 items), Psychological exhaustion (4 items), Indolence (6 items), and Guilt (5 items). The model was analyzed using LISREL 8. Goodness-of-Fit statistics showed that the hypothesized model had adequate fit: chi2(164) = 605.86 (p<0.000); Goodness-of-Fit Index = 0.92; Adjusted Goodness-of-Fit Index = 0.90; Root Mean Square Error of Approximation = 0.062; Nonnormed Fit Index = 0.91; Comparative Fit Index = 0.92; and Parsimony Normed Fit Index = 0.77. Cronbach's alpha measures for all subscales were higher than 0.70. The study showed that the SBI has adequate factorial validity and internal consistency to assess burnout in Brazilian teachers.

  9. Detecting Mixtures from Structural Model Differences Using Latent Variable Mixture Modeling: A Comparison of Relative Model Fit Statistics

    ERIC Educational Resources Information Center

    Henson, James M.; Reise, Steven P.; Kim, Kevin H.

    2007-01-01

    The accuracy of structural model parameter estimates in latent variable mixture modeling was explored with a 3 (sample size) [times] 3 (exogenous latent mean difference) [times] 3 (endogenous latent mean difference) [times] 3 (correlation between factors) [times] 3 (mixture proportions) factorial design. In addition, the efficacy of several…

  10. Hierarchical animal movement models for population-level inference

    USGS Publications Warehouse

    Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

    2016-01-01

    New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

  11. Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System

    NASA Astrophysics Data System (ADS)

    Demirdjian, L.; Zhou, Y.; Huffman, G. J.

    2016-12-01

    This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.

  12. Quantitative differentiation of breast lesions at 3T diffusion-weighted imaging (DWI) using the ratio of distributed diffusion coefficient (DDC).

    PubMed

    Ertas, Gokhan; Onaygil, Can; Akin, Yasin; Kaya, Handan; Aribal, Erkin

    2016-12-01

    To investigate the accuracy of diffusion coefficients and diffusion coefficient ratios of breast lesions and of glandular breast tissue from mono- and stretched-exponential models for quantitative diagnosis in diffusion-weighted magnetic resonance imaging (MRI). We analyzed pathologically confirmed 170 lesions (85 benign and 85 malignant) imaged using a 3.0T MR scanner. Small regions of interest (ROIs) focusing on the highest signal intensity for lesions and also for glandular tissue of contralateral breast were obtained. Apparent diffusion coefficient (ADC) and distributed diffusion coefficient (DDC) were estimated by performing nonlinear fittings using mono- and stretched-exponential models, respectively. Coefficient ratios were calculated by dividing the lesion coefficient by the glandular tissue coefficient. A stretched exponential model provides significantly better fits then the monoexponential model (P < 0.001): 65% of the better fits for glandular tissue and 71% of the better fits for lesion. High correlation was found in diffusion coefficients (0.99-0.81 and coefficient ratios (0.94) between the models. The highest diagnostic accuracy was found by the DDC ratio (area under the curve [AUC] = 0.93) when compared with lesion DDC, ADC ratio, and lesion ADC (AUC = 0.91, 0.90, 0.90) but with no statistically significant difference (P > 0.05). At optimal thresholds, the DDC ratio achieves 93% sensitivity, 80% specificity, and 87% overall diagnostic accuracy, while ADC ratio leads to 89% sensitivity, 78% specificity, and 83% overall diagnostic accuracy. The stretched exponential model fits better with signal intensity measurements from both lesion and glandular tissue ROIs. Although the DDC ratio estimated by using the model shows a higher diagnostic accuracy than the ADC ratio, lesion DDC, and ADC, it is not statistically significant. J. Magn. Reson. Imaging 2016;44:1633-1641. © 2016 International Society for Magnetic Resonance in Medicine.

  13. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  14. Impact of Rating Scale Categories on Reliability and Fit Statistics of the Malay Spiritual Well-Being Scale using Rasch Analysis.

    PubMed

    Daher, Aqil Mohammad; Ahmad, Syed Hassan; Winn, Than; Selamat, Mohd Ikhsan

    2015-01-01

    Few studies have employed the item response theory in examining reliability. We conducted this study to examine the effect of Rating Scale Categories (RSCs) on the reliability and fit statistics of the Malay Spiritual Well-Being Scale, employing the Rasch model. The Malay Spiritual Well-Being Scale (SWBS) with the original six; three and four newly structured RSCs was distributed randomly among three different samples of 50 participants each. The mean age of respondents in the three samples ranged between 36 and 39 years old. The majority was female in all samples, and Islam was the most prevalent religion among the respondents. The predominating race was Malay, followed by Chinese and Indian. The original six RSCs indicated better targeting of 0.99 and smallest model error of 0.24. The Infit Mnsq (mean square) and Zstd (Z standard) of the six RSCs were "1.1"and "-0.1"respectively. The six RSCs achieved the highest person and item reliabilities of 0.86 and 0.85 respectively. These reliabilities yielded the highest person (2.46) and item (2.38) separation indices compared to other the RSCs. The person and item reliability and, to a lesser extent, the fit statistics, were better with the six RSCs compared to the four and three RSCs.

  15. Maximum likelihood estimation of finite mixture model for economic data

    NASA Astrophysics Data System (ADS)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-06-01

    Finite mixture model is a mixture model with finite-dimension. This models are provides a natural representation of heterogeneity in a finite number of latent classes. In addition, finite mixture models also known as latent class models or unsupervised learning models. Recently, maximum likelihood estimation fitted finite mixture models has greatly drawn statistician's attention. The main reason is because maximum likelihood estimation is a powerful statistical method which provides consistent findings as the sample sizes increases to infinity. Thus, the application of maximum likelihood estimation is used to fit finite mixture model in the present paper in order to explore the relationship between nonlinear economic data. In this paper, a two-component normal mixture model is fitted by maximum likelihood estimation in order to investigate the relationship among stock market price and rubber price for sampled countries. Results described that there is a negative effect among rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia.

  16. An Improved Statistical Solution for Global Seismicity by the HIST-ETAS Approach

    NASA Astrophysics Data System (ADS)

    Chu, A.; Ogata, Y.; Katsura, K.

    2010-12-01

    For long-term global seismic model fitting, recent work by Chu et al. (2010) applied the spatial-temporal ETAS model (Ogata 1998) and analyzed global data partitioned into tectonic zones based on geophysical characteristics (Bird 2003), and it has shown tremendous improvements of model fitting compared with one overall global model. While the ordinary ETAS model assumes constant parameter values across the complete region analyzed, the hierarchical space-time ETAS model (HIST-ETAS, Ogata 2004) is a newly introduced approach by proposing regional distinctions of the parameters for more accurate seismic prediction. As the HIST-ETAS model has been fit to regional data of Japan (Ogata 2010), our work applies the model to describe global seismicity. Employing the Akaike's Bayesian Information Criterion (ABIC) as an assessment method, we compare the MLE results with zone divisions considered to results obtained by an overall global model. Location dependent parameters of the model and Gutenberg-Richter b-values are optimized, and seismological interpretations are discussed.

  17. Development of an Advanced Respirator Fit-Test Headform

    PubMed Central

    Bergman, Michael S.; Zhuang, Ziqing; Hanson, David; Heimbuch, Brian K.; McDonald, Michael J.; Palmiero, Andrew J.; Shaffer, Ronald E.; Harnish, Delbert; Husband, Michael; Wander, Joseph D.

    2015-01-01

    Improved respirator test headforms are needed to measure the fit of N95 filtering facepiece respirators (FFRs) for protection studies against viable airborne particles. A Static (i.e., non-moving, non-speaking) Advanced Headform (StAH) was developed for evaluating the fit of N95 FFRs. The StAH was developed based on the anthropometric dimensions of a digital headform reported by the National Institute for Occupational Safety and Health (NIOSH) and has a silicone polymer skin with defined local tissue thicknesses. Quantitative fit factor evaluations were performed on seven N95 FFR models of various sizes and designs. Donnings were performed with and without a pre-test leak checking method. For each method, four replicate FFR samples of each of the seven models were tested with two donnings per replicate, resulting in a total of 56 tests per donning method. Each fit factor evaluation was comprised of three 86-sec exercises: “Normal Breathing” (NB, 11.2 liters per min (lpm)), “Deep Breathing” (DB, 20.4 lpm), then NB again. A fit factor for each exercise and an overall test fit factor were obtained. Analysis of variance methods were used to identify statistical differences among fit factors (analyzed as logarithms) for different FFR models, exercises, and testing methods. For each FFR model and for each testing method, the NB and DB fit factor data were not significantly different (P > 0.05). Significant differences were seen in the overall exercise fit factor data for the two donning methods among all FFR models (pooled data) and in the overall exercise fit factor data for the two testing methods within certain models. Utilization of the leak checking method improved the rate of obtaining overall exercise fit factors ≥100. The FFR models, which are expected to achieve overall fit factors ≥ 100 on human subjects, achieved overall exercise fit factors ≥ 100 on the StAH. Further research is needed to evaluate the correlation of FFRs fitted on the StAH to FFRs fitted on people. PMID:24369934

  18. Model Error Estimation for the CPTEC Eta Model

    NASA Technical Reports Server (NTRS)

    Tippett, Michael K.; daSilva, Arlindo

    1999-01-01

    Statistical data assimilation systems require the specification of forecast and observation error statistics. Forecast error is due to model imperfections and differences between the initial condition and the actual state of the atmosphere. Practical four-dimensional variational (4D-Var) methods try to fit the forecast state to the observations and assume that the model error is negligible. Here with a number of simplifying assumption, a framework is developed for isolating the model error given the forecast error at two lead-times. Two definitions are proposed for the Talagrand ratio tau, the fraction of the forecast error due to model error rather than initial condition error. Data from the CPTEC Eta Model running operationally over South America are used to calculate forecast error statistics and lower bounds for tau.

  19. Real time forecasting of near-future evolution.

    PubMed

    Gerrish, Philip J; Sniegowski, Paul D

    2012-09-07

    A metaphor for adaptation that informs much evolutionary thinking today is that of mountain climbing, where horizontal displacement represents change in genotype, and vertical displacement represents change in fitness. If it were known a priori what the 'fitness landscape' looked like, that is, how the myriad possible genotypes mapped onto fitness, then the possible paths up the fitness mountain could each be assigned a probability, thus providing a dynamical theory with long-term predictive power. Such detailed genotype-fitness data, however, are rarely available and are subject to change with each change in the organism or in the environment. Here, we take a very different approach that depends only on fitness or phenotype-fitness data obtained in real time and requires no a priori information about the fitness landscape. Our general statistical model of adaptive evolution builds on classical theory and gives reasonable predictions of fitness and phenotype evolution many generations into the future.

  20. SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chu, A; Ahmad, M; Chen, Z

    2014-06-01

    Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less

  1. An adaptive approach to the dynamic allocation of buffer storage. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Crooke, S. C.

    1970-01-01

    Several strategies for the dynamic allocation of buffer storage are simulated and compared. The basic algorithms investigated, using actual statistics observed in the Univac 1108 EXEC 8 System, include the buddy method and the first-fit method. Modifications are made to the basic methods in an effort to improve and to measure allocation performance. A simulation model of an adaptive strategy is developed which permits interchanging the two different methods, the buddy and the first-fit methods with some modifications. Using an adaptive strategy, each method may be employed in the statistical environment in which its performance is superior to the other method.

  2. Universality Classes of Interaction Structures for NK Fitness Landscapes

    NASA Astrophysics Data System (ADS)

    Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim

    2018-07-01

    Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.

  3. Universality Classes of Interaction Structures for NK Fitness Landscapes

    NASA Astrophysics Data System (ADS)

    Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim

    2018-02-01

    Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.

  4. pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

    NASA Astrophysics Data System (ADS)

    Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.

    2011-07-01

    Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.

  5. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  6. qFeature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-09-14

    This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.

  7. [An experimental study on the effect of different optical impression methods on marginal and internal fit of all-ceramic crowns].

    PubMed

    Tan, Fa-Bing; Wang, Lu; Fu, Gang; Wu, Shu-Hong; Jin, Ping

    2010-02-01

    To study the effect of different optical impression methods in Cerec 3D/Inlab MC XL system on marginal and internal fit of all-ceramic crowns. A right mandibular first molar in the standard model was used to prepare full crown and replicated into thirty-two plaster casts. Sixteen of them were selected randomly for bonding crown and the others were used for taking optical impression, in half of which the direct optical impression taking method were used and the others were used for the indirect method, and then eight Cerec Blocs all-ceramic crowns were manufactured respectively. The fit of all-ceramic crowns were evaluated by modified United States Public Health Service (USPHS) criteria and scanning electron microscope (SEM) imaging, and the data were statistically analyzed with SAS 9.1 software. The clinically acceptable rate for all marginal measurement sites was 87.5% according to USPHS criteria. There was no statistically significant difference in marginal fit between direct and indirect method group (P > 0.05). With SEM imaging, all marginal measurement sites were less than 120 microm and no statistically significant difference was found between direct and indirect method group in terms of marginal or internal fit (P > 0.05). But the direct method group showed better fit than indirect method group in terms of mesial surface, lingual surface, buccal surface and occlusal surface (P < 0.05). The distal surface's fit was worse and the obvious difference was observed between mesial surface and distal surface in direct method group (P < 0.01). Under the conditions of this study, the optical impression method had no significant effect on marginal fit of Cerec Blocs crowns, but it had certain effect on internal fit. Overall all-ceramic crowns appeared to have clinically acceptable marginal fit.

  8. Anatomy of the Higgs fits: A first guide to statistical treatments of the theoretical uncertainties

    NASA Astrophysics Data System (ADS)

    Fichet, Sylvain; Moreau, Grégory

    2016-04-01

    The studies of the Higgs boson couplings based on the recent and upcoming LHC data open up a new window on physics beyond the Standard Model. In this paper, we propose a statistical guide to the consistent treatment of the theoretical uncertainties entering the Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed in a unified formalism. We present analytical expressions for the marginal likelihoods, useful to implement simultaneously the experimental and theoretical uncertainties. We review the various origins of the theoretical errors (QCD, EFT, PDF, production mode contamination…). All these individual uncertainties are thoroughly combined with the help of moment-based considerations. The theoretical correlations among Higgs detection channels appear to affect the location and size of the best-fit regions in the space of Higgs couplings. We discuss the recurrent question of the shape of the prior distributions for the individual theoretical errors and find that a nearly Gaussian prior arises from the error combinations. We also develop the bias approach, which is an alternative to marginalisation providing more conservative results. The statistical framework to apply the bias principle is introduced and two realisations of the bias are proposed. Finally, depending on the statistical treatment, the Standard Model prediction for the Higgs signal strengths is found to lie within either the 68% or 95% confidence level region obtained from the latest analyses of the 7 and 8 TeV LHC datasets.

  9. Using Generalized Equivalent Uniform Dose Atlases to Combine and Analyze Prospective Dosimetric and Radiation Pneumonitis Data From 2 Non-Small Cell Lung Cancer Dose Escalation Protocols

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu Fan; Yorke, Ellen D.; Belderbos, Jose S.A.

    2013-01-01

    Purpose: To demonstrate the use of generalized equivalent uniform dose (gEUD) atlas for data pooling in radiation pneumonitis (RP) modeling, to determine the dependence of RP on gEUD, to study the consistency between data sets, and to verify the increased statistical power of the combination. Methods and Materials: Patients enrolled in prospective phase I/II dose escalation studies of radiation therapy of non-small cell lung cancer at Memorial Sloan-Kettering Cancer Center (MSKCC) (78 pts) and the Netherlands Cancer Institute (NKI) (86 pts) were included; 10 (13%) and 14 (17%) experienced RP requiring steroids (RPS) within 6 months after treatment. gEUD wasmore » calculated from dose-volume histograms. Atlases for each data set were created using 1-Gy steps from exact gEUDs and RPS data. The Lyman-Kutcher-Burman model was fit to the atlas and exact gEUD data. Heterogeneity and inconsistency statistics for the fitted parameters were computed. gEUD maps of the probability of RPS rate {>=}20% were plotted. Results: The 2 data sets were homogeneous and consistent. The best fit values of the volume effect parameter a were small, with upper 95% confidence limit around 1.0 in the joint data. The likelihood profiles around the best fit a values were flat in all cases, making determination of the best fit a weak. All confidence intervals (CIs) were narrower in the joint than in the individual data sets. The minimum P value for correlations of gEUD with RPS in the joint data was .002, compared with P=.01 and .05 for MSKCC and NKI data sets, respectively. gEUD maps showed that at small a, RPS risk increases with gEUD. Conclusions: The atlas can be used to combine gEUD and RPS information from different institutions and model gEUD dependence of RPS. RPS has a large volume effect with the mean dose model barely included in the 95% CI. Data pooling increased statistical power.« less

  10. A Simple Model for Estimating Total and Merchantable Tree Heights

    Treesearch

    Alan R. Ek; Earl T. Birdsall; Rebecca J. Spears

    1984-01-01

    A model is described for estimating total and merchantable tree heights for Lake States tree species. It is intended to be used for compiling forest survey data and in conjunction with growth models for developing projections of tree product yield. Model coefficients are given for 25 species along with fit statistics. Supporting data sets are also described.

  11. On the streaming model for redshift-space distortions

    NASA Astrophysics Data System (ADS)

    Kuruvilla, Joseph; Porciani, Cristiano

    2018-06-01

    The streaming model describes the mapping between real and redshift space for 2-point clustering statistics. Its key element is the probability density function (PDF) of line-of-sight pairwise peculiar velocities. Following a kinetic-theory approach, we derive the fundamental equations of the streaming model for ordered and unordered pairs. In the first case, we recover the classic equation while we demonstrate that modifications are necessary for unordered pairs. We then discuss several statistical properties of the pairwise velocities for DM particles and haloes by using a suite of high-resolution N-body simulations. We test the often used Gaussian ansatz for the PDF of pairwise velocities and discuss its limitations. Finally, we introduce a mixture of Gaussians which is known in statistics as the generalised hyperbolic distribution and show that it provides an accurate fit to the PDF. Once inserted in the streaming equation, the fit yields an excellent description of redshift-space correlations at all scales that vastly outperforms the Gaussian and exponential approximations. Using a principal-component analysis, we reduce the complexity of our model for large redshift-space separations. Our results increase the robustness of studies of anisotropic galaxy clustering and are useful for extending them towards smaller scales in order to test theories of gravity and interacting dark-energy models.

  12. Zero-state Markov switching count-data models: an empirical assessment.

    PubMed

    Malyshkina, Nataliya V; Mannering, Fred L

    2010-01-01

    In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.

  13. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  14. Statistical modelling as an aid to the design of retail sampling plans for mycotoxins in food.

    PubMed

    MacArthur, Roy; MacDonald, Susan; Brereton, Paul; Murray, Alistair

    2006-01-01

    A study has been carried out to assess appropriate statistical models for use in evaluating retail sampling plans for the determination of mycotoxins in food. A compound gamma model was found to be a suitable fit. A simulation model based on the compound gamma model was used to produce operating characteristic curves for a range of parameters relevant to retail sampling. The model was also used to estimate the minimum number of increments necessary to minimize the overall measurement uncertainty. Simulation results showed that measurements based on retail samples (for which the maximum number of increments is constrained by cost) may produce fit-for-purpose results for the measurement of ochratoxin A in dried fruit, but are unlikely to do so for the measurement of aflatoxin B1 in pistachio nuts. In order to produce a more accurate simulation, further work is required to determine the degree of heterogeneity associated with batches of food products. With appropriate parameterization in terms of physical and biological characteristics, the systems developed in this study could be applied to other analyte/matrix combinations.

  15. A Parametric Model of Shoulder Articulation for Virtual Assessment of Space Suit Fit

    NASA Technical Reports Server (NTRS)

    Kim, K. Han; Young, Karen S.; Bernal, Yaritza; Boppana, Abhishektha; Vu, Linh Q.; Benson, Elizabeth A.; Jarvis, Sarah; Rajulu, Sudhakar L.

    2016-01-01

    Suboptimal suit fit is a known risk factor for crewmember shoulder injury. Suit fit assessment is however prohibitively time consuming and cannot be generalized across wide variations of body shapes and poses. In this work, we have developed a new design tool based on the statistical analysis of body shape scans. This tool is aimed at predicting the skin deformation and shape variations for any body size and shoulder pose for a target population. This new process, when incorporated with CAD software, will enable virtual suit fit assessments, predictively quantifying the contact volume, and clearance between the suit and body surface at reduced time and cost.

  16. A Practical Guide to Check the Consistency of Item Response Patterns in Clinical Research Through Person-Fit Statistics: Examples and a Computer Program.

    PubMed

    Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N

    2016-02-01

    Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.

  17. Predictors of Latina/o Community College Student Vocational Choice of STEM Fields: Testing of the STEM-Vocational Choice Model

    ERIC Educational Resources Information Center

    Johnson, Joel D.

    2013-01-01

    This study confirmed appropriate measurement model fit for a theoretical model, the STEM vocational choice (STEM-VC) model. This model identifies exogenous factors that successfully predicted, at a statistically significant level, a student's vocational choice decision to pursue a STEM degree at transfer. The student population examined for this…

  18. Analysis of the statistical thermodynamic model for nonlinear binary protein adsorption equilibria.

    PubMed

    Zhou, Xiao-Peng; Su, Xue-Li; Sun, Yan

    2007-01-01

    The statistical thermodynamic (ST) model was used to study nonlinear binary protein adsorption equilibria on an anion exchanger. Single-component and binary protein adsorption isotherms of bovine hemoglobin (Hb) and bovine serum albumin (BSA) on DEAE Spherodex M were determined by batch adsorption experiments in 10 mM Tris-HCl buffer containing a specific NaCl concentration (0.05, 0.10, and 0.15 M) at pH 7.40. The ST model was found to depict the effect of ionic strength on the single-component equilibria well, with model parameters depending on ionic strength. Moreover, the ST model gave acceptable fitting to the binary adsorption data with the fitted single-component model parameters, leading to the estimation of the binary ST model parameter. The effects of ionic strength on the model parameters are reasonably interpreted by the electrostatic and thermodynamic theories. The effective charge of protein in adsorption phase can be separately calculated from the two categories of the model parameters, and the values obtained from the two methods are consistent. The results demonstrate the utility of the ST model for describing nonlinear binary protein adsorption equilibria.

  19. Imfit: A Fast, Flexible Program for Astronomical Image Fitting

    NASA Astrophysics Data System (ADS)

    Erwin, Peter

    2014-08-01

    Imift is an open-source astronomical image-fitting program specialized for galaxies but potentially useful for other sources, which is fast, flexible, and highly extensible. Its object-oriented design allows new types of image components (2D surface-brightness functions) to be easily written and added to the program. Image functions provided with Imfit include Sersic, exponential, and Gaussian galaxy decompositions along with Core-Sersic and broken-exponential profiles, elliptical rings, and three components that perform line-of-sight integration through 3D luminosity-density models of disks and rings seen at arbitrary inclinations. Available minimization algorithms include Levenberg-Marquardt, Nelder-Mead simplex, and Differential Evolution, allowing trade-offs between speed and decreased sensitivity to local minima in the fit landscape. Minimization can be done using the standard chi^2 statistic (using either data or model values to estimate per-pixel Gaussian errors, or else user-supplied error images) or the Cash statistic; the latter is particularly appropriate for cases of Poisson data in the low-count regime. The C++ source code for Imfit is available under the GNU Public License.

  20. Depression, stress, and intimate partner violence among Latino migrant and seasonal farmworkers in rural Southeastern North Carolina.

    PubMed

    Kim-Godwin, Yeoun Soo; Maume, Michael O; Fox, Jane A

    2014-12-01

    The purpose of the study is to identify the predictors of depression and intimate partner violence (IPV) among Latinos in rural Southeastern North Carolina. A sample of 291 migrant and seasonal farmworkers was interviewed to complete the demographic questionnaire, HITS (intimate violence tendency), Migrant Farmworker Stress Inventory, Center for Epidemiologic Studies Depression Scale (depression), and CAGE/4M (alcohol abuse). OLS regression and structural equation modeling were used to test the hypothesized relations between predictors of IPV and depression. The findings indicated that respondents reporting higher levels of stress also reported higher levels of IPV and depression. The goodness-of-fit statistics for the overall model again indicated a moderate fit of the model to the data (χ2 = 5,612, p < .001; root mean square error for approximation = 0.09; adjusted goodness-of-fit index = 0.44; comparative fit index = 0.52). Although the findings were not robust to estimation in the structural equation models, the OLS regression models indicated direct associations between IPV and depression.

  1. Stochastic modeling of sunshine number data

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Paulescu, Marius; Badescu, Viorel

    2013-11-01

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.

  2. Stochastic modeling of sunshine number data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

    2013-11-13

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less

  3. Reproducing tailing in breakthrough curves: Are statistical models equally representative and predictive?

    NASA Astrophysics Data System (ADS)

    Pedretti, Daniele; Bianchi, Marco

    2018-03-01

    Breakthrough curves (BTCs) observed during tracer tests in highly heterogeneous aquifers display strong tailing. Power laws are popular models for both the empirical fitting of these curves, and the prediction of transport using upscaling models based on best-fitted estimated parameters (e.g. the power law slope or exponent). The predictive capacity of power law based upscaling models can be however questioned due to the difficulties to link model parameters with the aquifers' physical properties. This work analyzes two aspects that can limit the use of power laws as effective predictive tools: (a) the implication of statistical subsampling, which often renders power laws undistinguishable from other heavily tailed distributions, such as the logarithmic (LOG); (b) the difficulties to reconcile fitting parameters obtained from models with different formulations, such as the presence of a late-time cutoff in the power law model. Two rigorous and systematic stochastic analyses, one based on benchmark distributions and the other on BTCs obtained from transport simulations, are considered. It is found that a power law model without cutoff (PL) results in best-fitted exponents (αPL) falling in the range of typical experimental values reported in the literature (1.5 < αPL < 4). The PL exponent tends to lower values as the tailing becomes heavier. Strong fluctuations occur when the number of samples is limited, due to the effects of subsampling. On the other hand, when the power law model embeds a cutoff (PLCO), the best-fitted exponent (αCO) is insensitive to the degree of tailing and to the effects of subsampling and tends to a constant αCO ≈ 1. In the PLCO model, the cutoff rate (λ) is the parameter that fully reproduces the persistence of the tailing and is shown to be inversely correlated to the LOG scale parameter (i.e. with the skewness of the distribution). The theoretical results are consistent with the fitting analysis of a tracer test performed during the MADE-5 experiment. It is shown that a simple mechanistic upscaling model based on the PLCO formulation is able to predict the ensemble of BTCs from the stochastic transport simulations without the need of any fitted parameters. The model embeds the constant αCO = 1 and relies on a stratified description of the transport mechanisms to estimate λ. The PL fails to reproduce the ensemble of BTCs at late time, while the LOG model provides consistent results as the PLCO model, however without a clear mechanistic link between physical properties and model parameters. It is concluded that, while all parametric models may work equally well (or equally wrong) for the empirical fitting of the experimental BTCs tails due to the effects of subsampling, for predictive purposes this is not true. A careful selection of the proper heavily tailed models and corresponding parameters is required to ensure physically-based transport predictions.

  4. Mathematical model of zinc absorption: effects of dietary calcium, protein and iron on zinc absorption

    PubMed Central

    Miller, Leland V.; Krebs, Nancy F.; Hambidge, K. Michael

    2013-01-01

    A previously described mathematical model of Zn absorption as a function of total daily dietary Zn and phytate was fitted to data from studies in which dietary Ca, Fe and protein were also measured. An analysis of regression residuals indicated statistically significant positive relationships between the residuals and Ca, Fe and protein, suggesting that the presence of any of these dietary components enhances Zn absorption. Based on the hypotheses that (1) Ca and Fe both promote Zn absorption by binding with phytate and thereby making it unavailable for binding Zn and (2) protein enhances the availability of Zn for transporter binding, the model was modified to incorporate these effects. The new model of Zn absorption as a function of dietary Zn, phytate, Ca, Fe and protein was then fitted to the data. The proportion of variation in absorbed Zn explained by the new model was 0·88, an increase from 0·82 with the original model. A reduced version of the model without Fe produced an equally good fit to the data and an improved value for the model selection criterion, demonstrating that when dietary Ca and protein are controlled for, there is no evidence that dietary Fe influences Zn absorption. Regression residuals and testing with additional data supported the validity of the new model. It was concluded that dietary Ca and protein modestly enhanced Zn absorption and Fe had no statistically discernable effect. Furthermore, the model provides a meaningful foundation for efforts to model nutrient interactions in mineral absorption. PMID:22617116

  5. Mathematical model of zinc absorption: effects of dietary calcium, protein and iron on zinc absorption.

    PubMed

    Miller, Leland V; Krebs, Nancy F; Hambidge, K Michael

    2013-02-28

    A previously described mathematical model of Zn absorption as a function of total daily dietary Zn and phytate was fitted to data from studies in which dietary Ca, Fe and protein were also measured. An analysis of regression residuals indicated statistically significant positive relationships between the residuals and Ca, Fe and protein, suggesting that the presence of any of these dietary components enhances Zn absorption. Based on the hypotheses that (1) Ca and Fe both promote Zn absorption by binding with phytate and thereby making it unavailable for binding Zn and (2) protein enhances the availability of Zn for transporter binding, the model was modified to incorporate these effects. The new model of Zn absorption as a function of dietary Zn, phytate, Ca, Fe and protein was then fitted to the data. The proportion of variation in absorbed Zn explained by the new model was 0·88, an increase from 0·82 with the original model. A reduced version of the model without Fe produced an equally good fit to the data and an improved value for the model selection criterion, demonstrating that when dietary Ca and protein are controlled for, there is no evidence that dietary Fe influences Zn absorption. Regression residuals and testing with additional data supported the validity of the new model. It was concluded that dietary Ca and protein modestly enhanced Zn absorption and Fe had no statistically discernable effect. Furthermore, the model provides a meaningful foundation for efforts to model nutrient interactions in mineral absorption.

  6. Broadband distortion modeling in Lyman-α forest BAO fitting

    DOE PAGES

    Blomqvist, Michael; Kirkby, David; Bautista, Julian E.; ...

    2015-11-23

    Recently, the Lyman-α absorption observed in the spectra of high-redshift quasars has been used as a tracer of large-scale structure by means of the three-dimensional Lyman-α forest auto-correlation function at redshift z≃ 2.3, but the need to fit the quasar continuum in every absorption spectrum introduces a broadband distortion that is difficult to correct and causes a systematic error for measuring any broadband properties. Here, we describe a k-space model for this broadband distortion based on a multiplicative correction to the power spectrum of the transmitted flux fraction that suppresses power on scales corresponding to the typical length of amore » Lyman-α forest spectrum. In implementing the distortion model in fits for the baryon acoustic oscillation (BAO) peak position in the Lyman-α forest auto-correlation, we find that the fitting method recovers the input values of the linear bias parameter b F and the redshift-space distortion parameter β F for mock data sets with a systematic error of less than 0.5%. Applied to the auto-correlation measured for BOSS Data Release 11, our method improves on the previous treatment of broadband distortions in BAO fitting by providing a better fit to the data using fewer parameters and reducing the statistical errors on βF and the combination b F(1+β F) by more than a factor of seven. The measured values at redshift z=2.3 are βF=1.39 +0.11 +0.24 +0.38 -0.10 -0.19 -0.28 and bF(1+βF)=-0.374 +0.007 +0.013 +0.020 -0.007 -0.014 -0.022 (1σ, 2σ and 3σ statistical errors). Our fitting software and the input files needed to reproduce our main results are publicly available.« less

  7. Broadband distortion modeling in Lyman-α forest BAO fitting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blomqvist, Michael; Kirkby, David; Margala, Daniel, E-mail: cblomqvi@uci.edu, E-mail: dkirkby@uci.edu, E-mail: dmargala@uci.edu

    2015-11-01

    In recent years, the Lyman-α absorption observed in the spectra of high-redshift quasars has been used as a tracer of large-scale structure by means of the three-dimensional Lyman-α forest auto-correlation function at redshift z≅ 2.3, but the need to fit the quasar continuum in every absorption spectrum introduces a broadband distortion that is difficult to correct and causes a systematic error for measuring any broadband properties. We describe a k-space model for this broadband distortion based on a multiplicative correction to the power spectrum of the transmitted flux fraction that suppresses power on scales corresponding to the typical length of amore » Lyman-α forest spectrum. Implementing the distortion model in fits for the baryon acoustic oscillation (BAO) peak position in the Lyman-α forest auto-correlation, we find that the fitting method recovers the input values of the linear bias parameter b{sub F} and the redshift-space distortion parameter β{sub F} for mock data sets with a systematic error of less than 0.5%. Applied to the auto-correlation measured for BOSS Data Release 11, our method improves on the previous treatment of broadband distortions in BAO fitting by providing a better fit to the data using fewer parameters and reducing the statistical errors on β{sub F} and the combination b{sub F}(1+β{sub F}) by more than a factor of seven. The measured values at redshift z=2.3 are β{sub F}=1.39{sup +0.11 +0.24 +0.38}{sub −0.10 −0.19 −0.28} and b{sub F}(1+β{sub F})=−0.374{sup +0.007 +0.013 +0.020}{sub −0.007 −0.014 −0.022} (1σ, 2σ and 3σ statistical errors). Our fitting software and the input files needed to reproduce our main results are publicly available.« less

  8. The crossing statistic: dealing with unknown errors in the dispersion of Type Ia supernovae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman; Clifton, Timothy; Ferreira, Pedro, E-mail: arman@ewha.ac.kr, E-mail: tclifton@astro.ox.ac.uk, E-mail: p.ferreira1@physics.ox.ac.uk

    2011-08-01

    We propose a new statistic that has been designed to be used in situations where the intrinsic dispersion of a data set is not well known: The Crossing Statistic. This statistic is in general less sensitive than χ{sup 2} to the intrinsic dispersion of the data, and hence allows us to make progress in distinguishing between different models using goodness of fit to the data even when the errors involved are poorly understood. The proposed statistic makes use of the shape and trends of a model's predictions in a quantifiable manner. It is applicable to a variety of circumstances, althoughmore » we consider it to be especially well suited to the task of distinguishing between different cosmological models using type Ia supernovae. We show that this statistic can easily distinguish between different models in cases where the χ{sup 2} statistic fails. We also show that the last mode of the Crossing Statistic is identical to χ{sup 2}, so that it can be considered as a generalization of χ{sup 2}.« less

  9. The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length

    ERIC Educational Resources Information Center

    Tendeiro, Jorge N.; Meijer, Rob R.

    2013-01-01

    To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…

  10. Modeling two strains of disease via aggregate-level infectivity curves.

    PubMed

    Romanescu, Razvan; Deardon, Rob

    2016-04-01

    Well formulated models of disease spread, and efficient methods to fit them to observed data, are powerful tools for aiding the surveillance and control of infectious diseases. Our project considers the problem of the simultaneous spread of two related strains of disease in a context where spatial location is the key driver of disease spread. We start our modeling work with the individual level models (ILMs) of disease transmission, and extend these models to accommodate the competing spread of the pathogens in a two-tier hierarchical population (whose levels we refer to as 'farm' and 'animal'). The postulated interference mechanism between the two strains is a period of cross-immunity following infection. We also present a framework for speeding up the computationally intensive process of fitting the ILM to data, typically done using Markov chain Monte Carlo (MCMC) in a Bayesian framework, by turning the inference into a two-stage process. First, we approximate the number of animals infected on a farm over time by infectivity curves. These curves are fit to data sampled from farms, using maximum likelihood estimation, then, conditional on the fitted curves, Bayesian MCMC inference proceeds for the remaining parameters. Finally, we use posterior predictive distributions of salient epidemic summary statistics, in order to assess the model fitted.

  11. Micro-CT evaluation of the marginal fit of CAD/CAM all ceramic crowns

    NASA Astrophysics Data System (ADS)

    Brenes, Christian

    Objectives: Evaluate the marginal fit of CAD/CAM all ceramic crowns made from lithium disilicate and zirconia using two different fabrication protocols (model and model-less). METHODS: Forty anterior all ceramic restorations (20 lithium disilicate, 20 zirconia) were fabricated using a CEREC Bluecam scanner. Two different fabrication methods were used: a full digital approach and a printed model. Completed crowns were cemented and marginal gap was evaluated using Micro-CT. Each specimen was analyzed in sagittal and trans-axial orientations, allowing a 360° evaluation of the vertical and horizontal fit. RESULTS: Vertical measurements in the lingual, distal and mesial views had and estimated marginal gap from 101.9 to 133.9 microns for E-max crowns and 126.4 to 165.4 microns for zirconia. No significant differences were found between model and model-less techniques. CONCLUSION: Lithium disilicate restorations exhibited a more accurate and consistent marginal adaptation when compared to zirconia crowns. No statistically significant differences were observed when comparing model or model-less approaches.

  12. Human X-chromosome inactivation pattern distributions fit a model of genetically influenced choice better than models of completely random choice

    PubMed Central

    Renault, Nisa K E; Pritchett, Sonja M; Howell, Robin E; Greer, Wenda L; Sapienza, Carmen; Ørstavik, Karen Helene; Hamilton, David C

    2013-01-01

    In eutherian mammals, one X-chromosome in every XX somatic cell is transcriptionally silenced through the process of X-chromosome inactivation (XCI). Females are thus functional mosaics, where some cells express genes from the paternal X, and the others from the maternal X. The relative abundance of the two cell populations (X-inactivation pattern, XIP) can have significant medical implications for some females. In mice, the ‘choice' of which X to inactivate, maternal or paternal, in each cell of the early embryo is genetically influenced. In humans, the timing of XCI choice and whether choice occurs completely randomly or under a genetic influence is debated. Here, we explore these questions by analysing the distribution of XIPs in large populations of normal females. Models were generated to predict XIP distributions resulting from completely random or genetically influenced choice. Each model describes the discrete primary distribution at the onset of XCI, and the continuous secondary distribution accounting for changes to the XIP as a result of development and ageing. Statistical methods are used to compare models with empirical data from Danish and Utah populations. A rigorous data treatment strategy maximises information content and allows for unbiased use of unphased XIP data. The Anderson–Darling goodness-of-fit statistics and likelihood ratio tests indicate that a model of genetically influenced XCI choice better fits the empirical data than models of completely random choice. PMID:23652377

  13. The consistency of standard cosmology and the BATSE number versus brightness relation

    NASA Technical Reports Server (NTRS)

    Wickramasinghe, W. A. D. T.; Nemiroff, R. J.; Norris, J. P.; Kouveliotou, C.; Fishman, G. J.; Meegan, C. A.; Wilson, R. B.; Paciesas, W. S.

    1993-01-01

    The integrated number-peak-flux relation measured by the Burst and Transient Source Experiment (BATSE) on board the Compton Gamma Ray Observatory is compared with several standard cosmological distributions for gamma-ray bursts (GRBs). Friedmann-Robertson-Walker models were used along with the assumption that the bursts are standard candles and have no number or luminosity evolution. For a given Omega spectral shape, we used a free parameter, essentially the comoving number density of bursts, to generate a best fit between the cosmology and the measured relation. Our results are shown for a subsample of the first 260 GRBs recorded by BATSE. We find acceptable fits between simple cosmological models and the brightness distribution data, as determined by the Kolmogorov-Smirnov one-distribution statistical test. One cannot distinguish a single best cosmological model from the goodness of the fits. The best fit implies that BATSE GRBs are complete out to a redshift of about unity. However, significantly higher and lower redshifts, by as much as a factor of 2, are possible for other marginally acceptable fits.

  14. Confidence Intervals for the Between-Study Variance in Random Effects Meta-Analysis Using Generalised Cochran Heterogeneity Statistics

    ERIC Educational Resources Information Center

    Jackson, Dan

    2013-01-01

    Statistical inference is problematic in the common situation in meta-analysis where the random effects model is fitted to just a handful of studies. In particular, the asymptotic theory of maximum likelihood provides a poor approximation, and Bayesian methods are sensitive to the prior specification. Hence, less efficient, but easily computed and…

  15. A methodological approach to short-term tracking of youth physical fitness: the Oporto Growth, Health and Performance Study.

    PubMed

    Souza, Michele; Eisenmann, Joey; Chaves, Raquel; Santos, Daniel; Pereira, Sara; Forjaz, Cláudia; Maia, José

    2016-10-01

    In this paper, three different statistical approaches were used to investigate short-term tracking of cardiorespiratory and performance-related physical fitness among adolescents. Data were obtained from the Oporto Growth, Health and Performance Study and comprised 1203 adolescents (549 girls) divided into two age cohorts (10-12 and 12-14 years) followed for three consecutive years, with annual assessment. Cardiorespiratory fitness was assessed with 1-mile run/walk test; 50-yard dash, standing long jump, handgrip, and shuttle run test were used to rate performance-related physical fitness. Tracking was expressed in three different ways: auto-correlations, multilevel modelling with crude and adjusted model (for biological maturation, body mass index, and physical activity), and Cohen's Kappa (κ) computed in IBM SPSS 20.0, HLM 7.01 and Longitudinal Data Analysis software, respectively. Tracking of physical fitness components was (1) moderate-to-high when described by auto-correlations; (2) low-to-moderate when crude and adjusted models were used; and (3) low according to Cohen's Kappa (κ). These results demonstrate that when describing tracking, different methods should be considered since they provide distinct and more comprehensive views about physical fitness stability patterns.

  16. A metabolism-based whole lake eutrophication model to estimate the magnitude and time scales of the effects of restoration in Upper Klamath Lake, south-central Oregon

    USGS Publications Warehouse

    Wherry, Susan A.; Wood, Tamara M.

    2018-04-27

    A whole lake eutrophication (WLE) model approach for phosphorus and cyanobacterial biomass in Upper Klamath Lake, south-central Oregon, is presented here. The model is a successor to a previous model developed to inform a Total Maximum Daily Load (TMDL) for phosphorus in the lake, but is based on net primary production (NPP), which can be calculated from dissolved oxygen, rather than scaling up a small-scale description of cyanobacterial growth and respiration rates. This phase 3 WLE model is a refinement of the proof-of-concept developed in phase 2, which was the first attempt to use NPP to simulate cyanobacteria in the TMDL model. The calibration of the calculated NPP WLE model was successful, with performance metrics indicating a good fit to calibration data, and the calculated NPP WLE model was able to simulate mid-season bloom decreases, a feature that previous models could not reproduce.In order to use the model to simulate future scenarios based on phosphorus load reduction, a multivariate regression model was created to simulate NPP as a function of the model state variables (phosphorus and chlorophyll a) and measured meteorological and temperature model inputs. The NPP time series was split into a low- and high-frequency component using wavelet analysis, and regression models were fit to the components separately, with moderate success.The regression models for NPP were incorporated in the WLE model, referred to as the “scenario” WLE (SWLE), and the fit statistics for phosphorus during the calibration period were mostly unchanged. The fit statistics for chlorophyll a, however, were degraded. These statistics are still an improvement over prior models, and indicate that the SWLE is appropriate for long-term predictions even though it misses some of the seasonal variations in chlorophyll a.The complete whole lake SWLE model, with multivariate regression to predict NPP, was used to make long-term simulations of the response to 10-, 20-, and 40-percent reductions in tributary nutrient loads. The long-term mean water column concentration of total phosphorus was reduced by 9, 18, and 36 percent, respectively, in response to these load reductions. The long-term water column chlorophyll a concentration was reduced by 4, 13, and 44 percent, respectively. The adjustment to a new equilibrium between the water column and sediments occurred over about 30 years.

  17. Flexible statistical modelling detects clinical functional magnetic resonance imaging activation in partially compliant subjects.

    PubMed

    Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D

    2007-02-01

    Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.

  18. Goodness of fit of probability distributions for sightings as species approach extinction.

    PubMed

    Vogel, Richard M; Hosking, Jonathan R M; Elphick, Chris S; Roberts, David L; Reed, J Michael

    2009-04-01

    Estimating the probability that a species is extinct and the timing of extinctions is useful in biological fields ranging from paleoecology to conservation biology. Various statistical methods have been introduced to infer the time of extinction and extinction probability from a series of individual sightings. There is little evidence, however, as to which of these models provide adequate fit to actual sighting records. We use L-moment diagrams and probability plot correlation coefficient (PPCC) hypothesis tests to evaluate the goodness of fit of various probabilistic models to sighting data collected for a set of North American and Hawaiian bird populations that have either gone extinct, or are suspected of having gone extinct, during the past 150 years. For our data, the uniform, truncated exponential, and generalized Pareto models performed moderately well, but the Weibull model performed poorly. Of the acceptable models, the uniform distribution performed best based on PPCC goodness of fit comparisons and sequential Bonferroni-type tests. Further analyses using field significance tests suggest that although the uniform distribution is the best of those considered, additional work remains to evaluate the truncated exponential model more fully. The methods we present here provide a framework for evaluating subsequent models.

  19. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  20. Biocultural approach of the association between maturity and physical activity in youth.

    PubMed

    Werneck, André O; Silva, Danilo R; Collings, Paul J; Fernandes, Rômulo A; Ronque, Enio R V; Coelho-E-Silva, Manuel J; Sardinha, Luís B; Cyrino, Edilson S

    2017-11-13

    To test the biocultural model through direct and indirect associations between biological maturation, adiposity, cardiorespiratory fitness, feelings of sadness, social relationships, and physical activity in adolescents. This was a cross-sectional study conducted with 1,152 Brazilian adolescents aged between 10 and 17 years. Somatic maturation was estimated through Mirwald's method (peak height velocity). Physical activity was assessed through Baecke questionnaire (occupational, leisure, and sport contexts). Body mass index, body fat (sum of skinfolds), cardiorespiratory fitness (20-m shuttle run test), self-perceptions of social relationship, and frequency of sadness feelings were obtained for statistical modeling. Somatic maturation is directly related to sport practice and leisure time physical activity only among girls (β=0.12, p<0.05 and β=0.09, respectively, p<0.05). Moreover, biological (adiposity and cardiorespiratory fitness), psychological (sadness), and social (satisfaction with social relationships) variables mediated the association between maturity and physical activity in boys and for occupational physical activity in girls. In general, models presented good fit coefficients. Biocultural model presents good fit and emotional/biological factors mediate part of the relationship between somatic maturation and physical activity. Copyright © 2017 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.

  1. Nested Structural Equation Models: Noncentrality and Power of Restriction Test.

    ERIC Educational Resources Information Center

    Raykov, Tenko; Penev, Spiridon

    1998-01-01

    Discusses the difference in noncentrality parameters of nested structural equation models and their utility in evaluating statistical power associated with the pertinent restriction test. Asymptotic confidence intervals for that difference are presented. These intervals represent a useful adjunct to goodness-of-fit indexes in assessing constraints…

  2. A systematic literature review of PTSD's latent structure in the Diagnostic and Statistical Manual of Mental Disorders: DSM-IV to DSM-5.

    PubMed

    Armour, Cherie; Műllerová, Jana; Elhai, Jon D

    2016-03-01

    The factor structure of posttraumatic stress disorder (PTSD) has been widely researched, but consensus regarding the exact number and nature of factors is yet to be reached. The aim of the current study was to systematically review the extant literature on PTSD's latent structure in the Diagnostic and Statistical Manual of Mental Disorders (DSM) in order to identify the best-fitting model. One hundred and twelve research papers published after 1994 using confirmatory factor analysis and DSM-based measures of PTSD were included in the review. In the DSM-IV literature, four-factor models received substantial support, but the five-factor Dysphoric arousal model demonstrated the best fit, regardless of gender, measurement instrument or trauma type. The recently proposed DSM-5 PTSD model was found to be a good representation of PTSD's latent structure, but studies analysing the six- and seven-factor models suggest that the DSM-5 PTSD factor structure may need further alterations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Quantum statistics in complex networks

    NASA Astrophysics Data System (ADS)

    Bianconi, Ginestra

    The Barabasi-Albert (BA) model for a complex network shows a characteristic power law connectivity distribution typical of scale free systems. The Ising model on the BA network shows that the ferromagnetic phase transition temperature depends logarithmically on its size. We have introduced a fitness parameter for the BA network which describes the different abilities of nodes to compete for links. This model predicts the formation of a scale free network where each node increases its connectivity in time as a power-law with an exponent depending on its fitness. This model includes the fact that the node connectivity and growth rate do not depend on the node age alone and it reproduces non trivial correlation properties of the Internet. We have proposed a model of bosonic networks by a generalization of the BA model where the properties of quantum statistics can be applied. We have introduced a fitness eta i = e-bei where the temperature T = 1/ b is determined by the noise in the system and the energy ei accounts for qualitative differences of each node for acquiring links. The results of this work show that a power law network with exponent gamma = 2 can give a Bose condensation where a single node grabs a finite fraction of all the links. In order to address the connection with self-organized processes we have introduced a model for a growing Cayley tree that generalizes the dynamics of invasion percolation. At each node we associate a parameter ei (called energy) such that the probability to grow for each node is given by pii ∝ ebei where T = 1/ b is a statistical parameter of the system determined by the noise called the temperature. This model has been solved analytically with a similar mathematical technique as the bosonic scale-free networks and it shows the self organization of the low energy nodes at the interface. In the thermodynamic limit the Fermi distribution describes the probability of the energy distribution at the interface.

  4. Evaluation Of Statistical Models For Forecast Errors From The HBV-Model

    NASA Astrophysics Data System (ADS)

    Engeland, K.; Kolberg, S.; Renard, B.; Stensland, I.

    2009-04-01

    Three statistical models for the forecast errors for inflow to the Langvatn reservoir in Northern Norway have been constructed and tested according to how well the distribution and median values of the forecasts errors fit to the observations. For the first model observed and forecasted inflows were transformed by the Box-Cox transformation before a first order autoregressive model was constructed for the forecast errors. The parameters were conditioned on climatic conditions. In the second model the Normal Quantile Transformation (NQT) was applied on observed and forecasted inflows before a similar first order autoregressive model was constructed for the forecast errors. For the last model positive and negative errors were modeled separately. The errors were first NQT-transformed before a model where the mean values were conditioned on climate, forecasted inflow and yesterday's error. To test the three models we applied three criterions: We wanted a) the median values to be close to the observed values; b) the forecast intervals to be narrow; c) the distribution to be correct. The results showed that it is difficult to obtain a correct model for the forecast errors, and that the main challenge is to account for the auto-correlation in the errors. Model 1 and 2 gave similar results, and the main drawback is that the distributions are not correct. The 95% forecast intervals were well identified, but smaller forecast intervals were over-estimated, and larger intervals were under-estimated. Model 3 gave a distribution that fits better, but the median values do not fit well since the auto-correlation is not properly accounted for. If the 95% forecast interval is of interest, Model 2 is recommended. If the whole distribution is of interest, Model 3 is recommended.

  5. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  6. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  7. Statistics of Data Fitting: Flaws and Fixes of Polynomial Analysis of Channeled Spectra

    NASA Astrophysics Data System (ADS)

    Karstens, William; Smith, David

    2013-03-01

    Starting from general statistical principles, we have critically examined Baumeister's procedure* for determining the refractive index of thin films from channeled spectra. Briefly, the method assumes that the index and interference fringe order may be approximated by polynomials quadratic and cubic in photon energy, respectively. The coefficients of the polynomials are related by differentiation, which is equivalent to comparing energy differences between fringes. However, we find that when the fringe order is calculated from the published IR index for silicon* and then analyzed with Baumeister's procedure, the results do not reproduce the original index. This problem has been traced to 1. Use of unphysical powers in the polynomials (e.g., time-reversal invariance requires that the index is an even function of photon energy), and 2. Use of insufficient terms of the correct parity. Exclusion of unphysical terms and addition of quartic and quintic terms to the index and order polynomials yields significantly better fits with fewer parameters. This represents a specific example of using statistics to determine if the assumed fitting model adequately captures the physics contained in experimental data. The use of analysis of variance (ANOVA) and the Durbin-Watson statistic to test criteria for the validity of least-squares fitting will be discussed. *D.F. Edwards and E. Ochoa, Appl. Opt. 19, 4130 (1980). Supported in part by the US Department of Energy, Office of Nuclear Physics under contract DE-AC02-06CH11357.

  8. Identifying Measurement Disturbance Effects Using Rasch Item Fit Statistics and the Logit Residual Index.

    ERIC Educational Resources Information Center

    Mount, Robert E.; Schumacker, Randall E.

    1998-01-01

    A Monte Carlo study was conducted using simulated dichotomous data to determine the effects of guessing on Rasch item fit statistics and the Logit Residual Index. Results indicate that no significant differences were found between the mean Rasch item fit statistics for each distribution type as the probability of guessing the correct answer…

  9. How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2017-01-01

    Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…

  10. Uncertainty in eddy covariance measurements and its application to physiological models

    Treesearch

    D.Y. Hollinger; A.D. Richardson; A.D. Richardson

    2005-01-01

    Flux data are noisy, and this uncertainty is largely due to random measurement error. Knowledge of uncertainty is essential for the statistical evaluation of modeled andmeasured fluxes, for comparison of parameters derived by fitting models to measured fluxes and in formal data-assimilation efforts. We used the difference between simultaneous measurements from two...

  11. Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

    ERIC Educational Resources Information Center

    Wells, Craig S.; Bolt, Daniel M.

    2008-01-01

    Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

  12. Goodness-of-fit tests for open capture-recapture models

    USGS Publications Warehouse

    Pollock, K.H.; Hines, J.E.; Nichols, J.D.

    1985-01-01

    General goodness-of-fit tests for the Jolly-Seber model are proposed. These tests are based on conditional arguments using minimal sufficient statistics. The tests are shown to be of simple hypergeometric form so that a series of independent contingency table chi-square tests can be performed. The relationship of these tests to other proposed tests is discussed. This is followed by a simulation study of the power of the tests to detect departures from the assumptions of the Jolly-Seber model. Some meadow vole capture-recapture data are used to illustrate the testing procedure which has been implemented in a computer program available from the authors.

  13. The Apollo 16 regolith - A petrographically-constrained chemical mixing model

    NASA Technical Reports Server (NTRS)

    Kempa, M. J.; Papike, J. J.; White, C.

    1980-01-01

    A mixing model for Apollo 16 regolith samples has been developed, which differs from other A-16 mixing models in that it is both petrographically constrained and statistically sound. The model was developed using three components representative of rock types present at the A-16 site, plus a representative mare basalt. A linear least-squares fitting program employing the chi-squared test and sum of components was used to determine goodness of fit. Results for surface soils indicate that either there are no significant differences between Cayley and Descartes material at the A-16 site or, if differences do exist, they have been obscured by meteoritic reworking and mixing of the lithologies.

  14. A multiplicative process for generating a beta-like survival function with application to the UK 2016 EU referendum results

    NASA Astrophysics Data System (ADS)

    Fenner, Trevor; Kaufmann, Eric; Levene, Mark; Loizou, George

    Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Contextual and selection effects tend to produce extreme values in the tails of rank-ordered distributions of both census data and district-level election outcomes. Models that account for this nonlinearity generally outperform linear models. Fitting nonlinear functions based on rank-ordering census and election data therefore improves the fit of aggregate voting models. This may help improve ecological inference, as well as election forecasting in majoritarian systems. We propose a generative multiplicative decrease model that gives rise to a rank-order distribution and facilitates the analysis of the recent UK EU referendum results. We supply empirical evidence that the beta-like survival function, which can be generated directly from our model, is a close fit to the referendum results, and also may have predictive value when covariate data are available.

  15. A Kp-based model of auroral boundaries

    NASA Astrophysics Data System (ADS)

    Carbary, James F.

    2005-10-01

    The auroral oval can serve as both a representation and a prediction of space weather on a global scale, so a competent model of the oval as a function of a geomagnetic index could conveniently appraise space weather itself. A simple model of the auroral boundaries is constructed by binning several months of images from the Polar Ultraviolet Imager by Kp index. The pixel intensities are first averaged into magnetic latitude-magnetic local time (MLT-MLAT) and local time bins, and intensity profiles are then derived for each Kp level at 1 hour intervals of MLT. After background correction, the boundary latitudes of each profile are determined at a threshold of 4 photons cm-2 s1. The peak locations and peak intensities are also found. The boundary and peak locations vary linearly with Kp index, and the coefficients of the linear fits are tabulated for each MLT. As a general rule of thumb, the UV intensity peak shifts 1° in magnetic latitude for each increment in Kp. The fits are surprisingly good for Kp < 6 but begin to deteriorate at high Kp because of auroral boundary irregularities and poor statistics. The statistical model allows calculation of the auroral boundaries at most MLTs as a function of Kp and can serve as an approximation to the shape and extent of the statistical oval.

  16. Regression Analysis of Long-term Profile Ozone Data Set from BUV Instruments

    NASA Technical Reports Server (NTRS)

    Frith, Stacey; Taylor, Steve; DeLand, Matt; Ahn, Chang-Woo; Stolarski, Richard S.

    2005-01-01

    We have produced a profile merged ozone data set (MOD) based on the SBUV/SBUV2 series of nadir-viewing satellite backscatter instruments, covering the period from November 1978 - December 2003. In 2004, data from the Nimbus 7 SBUV and NOAA 9,11, and 16 SBUV/2 instruments were reprocessed using the Version 8 (V8) algorithm and most recent calibrations. More recently, data from the Nimbus 4 BUV instrument, which operated from 1970 - 1977, were also reprocessed using the V8 algorithm. As part of the V8 profile calibration, the Nimbus 7 and NOAA 9 (1993-1997 only) instrument calibrations have been adjusted to match the NOAA 11 calibration, which was established from comparisons with SSBUV shuttle flight data. Given the level of agreement between the data sets, we simply average the ozone values during periods of instrument overlap to produce the MOD profile data set. We use statistical time-series analysis of the MOD profile data set (1978-2003) to estimate the change in profile ozone due to changing stratospheric chlorine levels. The Nimbus 4 BUV data offer an opportunity to test the physical properties of our statistical model. We extrapolate our statistical model fit backwards in time and compare to the Nimbus 4 data. We compare the statistics of the residuals from the fit for the Nimbus 4 period to those obtained from the 1978-2003 period over which the statistical model coefficients were estimated.

  17. Frequentist Model Averaging in Structural Equation Modelling.

    PubMed

    Jin, Shaobo; Ankargren, Sebastian

    2018-06-04

    Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

  18. A Simplified Algorithm for Statistical Investigation of Damage Spreading

    NASA Astrophysics Data System (ADS)

    Gecow, Andrzej

    2009-04-01

    On the way to simulating adaptive evolution of complex system describing a living object or human developed project, a fitness should be defined on node states or network external outputs. Feedbacks lead to circular attractors of these states or outputs which make it difficult to define a fitness. The main statistical effects of adaptive condition are the result of small change tendency and to appear, they only need a statistically correct size of damage initiated by evolutionary change of system. This observation allows to cut loops of feedbacks and in effect to obtain a particular statistically correct state instead of a long circular attractor which in the quenched model is expected for chaotic network with feedback. Defining fitness on such states is simple. We calculate only damaged nodes and only once. Such an algorithm is optimal for investigation of damage spreading i.e. statistical connections of structural parameters of initial change with the size of effected damage. It is a reversed-annealed method—function and states (signals) may be randomly substituted but connections are important and are preserved. The small damages important for adaptive evolution are correctly depicted in comparison to Derrida annealed approximation which expects equilibrium levels for large networks. The algorithm indicates these levels correctly. The relevant program in Pascal, which executes the algorithm for a wide range of parameters, can be obtained from the author.

  19. GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs. RooFit on CPUs

    NASA Astrophysics Data System (ADS)

    Pompili, Alexis; Di Florio, Adriano; CMS Collaboration

    2016-10-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the Jψϕ invariant mass in the three-body decay B +→JψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerably resulting speed-up, while comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may apply or does not apply because its regularity conditions are not satisfied.

  20. Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

    NASA Astrophysics Data System (ADS)

    Di Florio, Adriano

    2017-10-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  1. WAIS-IV subtest covariance structure: conceptual and statistical considerations.

    PubMed

    Ward, L Charles; Bergman, Maria A; Hebert, Katina R

    2012-06-01

    D. Wechsler (2008b) reported confirmatory factor analyses (CFAs) with standardization data (ages 16-69 years) for 10 core and 5 supplemental subtests from the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV). Analyses of the 15 subtests supported 4 hypothesized oblique factors (Verbal Comprehension, Working Memory, Perceptual Reasoning, and Processing Speed) but also revealed unexplained covariance between Block Design and Visual Puzzles (Perceptual Reasoning subtests). That covariance was not included in the final models. Instead, a path was added from Working Memory to Figure Weights (Perceptual Reasoning subtest) to improve fit and achieve a desired factor pattern. The present research with the same data (N = 1,800) showed that the path from Working Memory to Figure Weights increases the association between Working Memory and Matrix Reasoning. Specifying both paths improves model fit and largely eliminates unexplained covariance between Block Design and Visual Puzzles but with the undesirable consequence that Figure Weights and Matrix Reasoning are equally determined by Perceptual Reasoning and Working Memory. An alternative 4-factor model was proposed that explained theory-implied covariance between Block Design and Visual Puzzles and between Arithmetic and Figure Weights while maintaining compatibility with WAIS-IV Index structure. The proposed model compared favorably with a 5-factor model based on Cattell-Horn-Carroll theory. The present findings emphasize that covariance model comparisons should involve considerations of conceptual coherence and theoretical adherence in addition to statistical fit. (c) 2012 APA, all rights reserved

  2. The Matching Relation and Situation-Specific Bias Modulation in Professional Football Play Selection

    PubMed Central

    Stilling, Stephanie T; Critchfield, Thomas S

    2010-01-01

    The utility of a quantitative model depends on the extent to which its fitted parameters vary systematically with environmental events of interest. Professional football statistics were analyzed to determine whether play selection (passing versus rushing plays) could be accounted for with the generalized matching equation, and in particular whether variations in play selection across game situations would manifest as changes in the equation's fitted parameters. Statistically significant changes in bias were found for each of five types of game situations; no systematic changes in sensitivity were observed. Further analyses suggested relationships between play selection bias and both turnover probability (which can be described in terms of punishment) and yards-gained variance (which can be described in terms of variable-magnitude reinforcement schedules). The present investigation provides a useful demonstration of association between face-valid, situation-specific effects in a domain of everyday interest, and a theoretically important term of a quantitative model of behavior. Such associations, we argue, are an essential focus in translational extensions of quantitative models. PMID:21119855

  3. Fitting direct covariance structures by the MSTRUCT modeling language of the CALIS procedure.

    PubMed

    Yung, Yiu-Fai; Browne, Michael W; Zhang, Wei

    2015-02-01

    This paper demonstrates the usefulness and flexibility of the general structural equation modelling (SEM) approach to fitting direct covariance patterns or structures (as opposed to fitting implied covariance structures from functional relationships among variables). In particular, the MSTRUCT modelling language (or syntax) of the CALIS procedure (SAS/STAT version 9.22 or later: SAS Institute, 2010) is used to illustrate the SEM approach. The MSTRUCT modelling language supports a direct covariance pattern specification of each covariance element. It also supports the input of additional independent and dependent parameters. Model tests, fit statistics, estimates, and their standard errors are then produced under the general SEM framework. By using numerical and computational examples, the following tests of basic covariance patterns are illustrated: sphericity, compound symmetry, and multiple-group covariance patterns. Specification and testing of two complex correlation structures, the circumplex pattern and the composite direct product models with or without composite errors and scales, are also illustrated by the MSTRUCT syntax. It is concluded that the SEM approach offers a general and flexible modelling of direct covariance and correlation patterns. In conjunction with the use of SAS macros, the MSTRUCT syntax provides an easy-to-use interface for specifying and fitting complex covariance and correlation structures, even when the number of variables or parameters becomes large. © 2014 The British Psychological Society.

  4. Estimating Dynamical Systems: Derivative Estimation Hints From Sir Ronald A. Fisher.

    PubMed

    Deboeck, Pascal R

    2010-08-06

    The fitting of dynamical systems to psychological data offers the promise of addressing new and innovative questions about how people change over time. One method of fitting dynamical systems is to estimate the derivatives of a time series and then examine the relationships between derivatives using a differential equation model. One common approach for estimating derivatives, Local Linear Approximation (LLA), produces estimates with correlated errors. Depending on the specific differential equation model used, such correlated errors can lead to severely biased estimates of differential equation model parameters. This article shows that the fitting of dynamical systems can be improved by estimating derivatives in a manner similar to that used to fit orthogonal polynomials. Two applications using simulated data compare the proposed method and a generalized form of LLA when used to estimate derivatives and when used to estimate differential equation model parameters. A third application estimates the frequency of oscillation in observations of the monthly deaths from bronchitis, emphysema, and asthma in the United Kingdom. These data are publicly available in the statistical program R, and functions in R for the method presented are provided.

  5. Statistical validity of using ratio variables in human kinetics research.

    PubMed

    Liu, Yuanlong; Schutz, Robert W

    2003-09-01

    The purposes of this study were to investigate the validity of the simple ratio and three alternative deflation models and examine how the variation of the numerator and denominator variables affects the reliability of a ratio variable. A simple ratio and three alternative deflation models were fitted to four empirical data sets, and common criteria were applied to determine the best model for deflation. Intraclass correlation was used to examine the component effect on the reliability of a ratio variable. The results indicate that the validity, of a deflation model depends on the statistical characteristics of the particular component variables used, and an optimal deflation model for all ratio variables may not exist. Therefore, it is recommended that different models be fitted to each empirical data set to determine the best deflation model. It was found that the reliability of a simple ratio is affected by the coefficients of variation and the within- and between-trial correlations between the numerator and denominator variables. It was recommended that researchers should compute the reliability of the derived ratio scores and not assume that strong reliabilities in the numerator and denominator measures automatically lead to high reliability in the ratio measures.

  6. Efficient detection of wound-bed and peripheral skin with statistical colour models.

    PubMed

    Veredas, Francisco J; Mesa, Héctor; Morente, Laura

    2015-04-01

    A pressure ulcer is a clinical pathology of localised damage to the skin and underlying tissue caused by pressure, shear or friction. Reliable diagnosis supported by precise wound evaluation is crucial in order to success on treatment decisions. This paper presents a computer-vision approach to wound-area detection based on statistical colour models. Starting with a training set consisting of 113 real wound images, colour histogram models are created for four different tissue types. Back-projections of colour pixels on those histogram models are used, from a Bayesian perspective, to get an estimate of the posterior probability of a pixel to belong to any of those tissue classes. Performance measures obtained from contingency tables based on a gold standard of segmented images supplied by experts have been used for model selection. The resulting fitted model has been validated on a training set consisting of 322 wound images manually segmented and labelled by expert clinicians. The final fitted segmentation model shows robustness and gives high mean performance rates [(AUC: .9426 (SD .0563); accuracy: .8777 (SD .0799); F-score: 0.7389 (SD .1550); Cohen's kappa: .6585 (SD .1787)] when segmenting significant wound areas that include healing tissues.

  7. Assessment of Person Fit for Mixed-Format Tests

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2015-01-01

    Person-fit assessment may help the researcher to obtain additional information regarding the answering behavior of persons. Although several researchers examined person fit, there is a lack of research on person-fit assessment for mixed-format tests. In this article, the lz statistic and the ?2 statistic, both of which have been used for tests…

  8. Risk Estimation for Lung Cancer in Libya: Analysis Based on Standardized Morbidity Ratio, Poisson-Gamma Model, BYM Model and Mixture Model

    PubMed

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-03-01

    Cancer is the most rapidly spreading disease in the world, especially in developing countries, including Libya. Cancer represents a significant burden on patients, families, and their societies. This disease can be controlled if detected early. Therefore, disease mapping has recently become an important method in the fields of public health research and disease epidemiology. The correct choice of statistical model is a very important step to producing a good map of a disease. Libya was selected to perform this work and to examine its geographical variation in the incidence of lung cancer. The objective of this paper is to estimate the relative risk for lung cancer. Four statistical models to estimate the relative risk for lung cancer and population censuses of the study area for the time period 2006 to 2011 were used in this work. They are initially known as Standardized Morbidity Ratio, which is the most popular statistic, which used in the field of disease mapping, Poisson-gamma model, which is one of the earliest applications of Bayesian methodology, Besag, York and Mollie (BYM) model and Mixture model. As an initial step, this study begins by providing a review of all proposed models, which we then apply to lung cancer data in Libya. Maps, tables and graph, goodness-of-fit (GOF) were used to compare and present the preliminary results. This GOF is common in statistical modelling to compare fitted models. The main general results presented in this study show that the Poisson-gamma model, BYM model, and Mixture model can overcome the problem of the first model (SMR) when there is no observed lung cancer case in certain districts. Results show that the Mixture model is most robust and provides better relative risk estimates across a range of models. Creative Commons Attribution License

  9. Risk Estimation for Lung Cancer in Libya: Analysis Based on Standardized Morbidity Ratio, Poisson-Gamma Model, BYM Model and Mixture Model

    PubMed Central

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-01-01

    Cancer is the most rapidly spreading disease in the world, especially in developing countries, including Libya. Cancer represents a significant burden on patients, families, and their societies. This disease can be controlled if detected early. Therefore, disease mapping has recently become an important method in the fields of public health research and disease epidemiology. The correct choice of statistical model is a very important step to producing a good map of a disease. Libya was selected to perform this work and to examine its geographical variation in the incidence of lung cancer. The objective of this paper is to estimate the relative risk for lung cancer. Four statistical models to estimate the relative risk for lung cancer and population censuses of the study area for the time period 2006 to 2011 were used in this work. They are initially known as Standardized Morbidity Ratio, which is the most popular statistic, which used in the field of disease mapping, Poisson-gamma model, which is one of the earliest applications of Bayesian methodology, Besag, York and Mollie (BYM) model and Mixture model. As an initial step, this study begins by providing a review of all proposed models, which we then apply to lung cancer data in Libya. Maps, tables and graph, goodness-of-fit (GOF) were used to compare and present the preliminary results. This GOF is common in statistical modelling to compare fitted models. The main general results presented in this study show that the Poisson-gamma model, BYM model, and Mixture model can overcome the problem of the first model (SMR) when there is no observed lung cancer case in certain districts. Results show that the Mixture model is most robust and provides better relative risk estimates across a range of models. PMID:28440974

  10. Sunyaev-Zel'dovich Effect Derived Distance to the High Redshift Clusters MS 0451.6-0305 and CL 0016+16

    NASA Technical Reports Server (NTRS)

    Reese, E. D.; Mohr, J. J.; Carlstrom, J. E.; Grego, L.; Holder, G. P.; Holzapfel, W. L.; Hughes, J. P.; Patel, S. K.

    2000-01-01

    We determine the distances to the z approximately equal to 0.55 galaxy clusters MS 0451.6-0305 and CL 0016+16 from a maximum likelihood joint fit to interferometric Sunyaev-Zel'dovich effect (SZE) and X-ray observations. We model the intracluster medium (ICM) using a spherical isothermal beta-model. We quantify the statistical and systematic uncertainties inherent to these direct distance measurements, and we determine constraints on the Hubble parameter for three different cosmologies. For an OmegaM = 0.3, OmegaL = 0.7 cosmology, these distances imply a Hubble constant of 63(exp 12)(sub -9)(exp +21)(sub -21) km/s/Mpc, where the uncertainties correspond to statistical followed by systematic at 68% confidence. The best fit H(sub o) is 57 km/sec/Mpc for an open OmegaM = 0.3 universe and 52 km/s/Mpc for a flat Omega = 1 universe.

  11. Sunyaev-Zeldovich Effect-Derived Distances to the High-Redshift Clusters

    NASA Technical Reports Server (NTRS)

    Reese, Erik D.; Mohr, Joseph J.; Carlstrom, John E.; Joy, Marshall; Grego, Laura; Holder, Gilbert P.; Holzapfel, William L.; Hughes, John P.; Patel, Sandeep K.; Donahue, Megan

    2000-01-01

    We determine the distances to the z approximately equals 0.55 galaxy clusters MS 0451.6 - 0305 and Cl 0016 + 16 from a maximum-likelihood joint fit to interferometric Sunyaev-Zeldovich effect (SZE) and X-ray observations. We model the intracluster medium (ICM) using a spherical isothermal beta model. We quantify the statistical and systematic uncertainties inherent to these direct distance measurements, and we determine constraints on the Hubble parameter for three different cosmologies. For an Omega(sub M) = 0.3, Omega(sub lambda) = 0.7 cosmology, these distances imply a Hubble constant of 63(sup +12) (sub -9) (sup + 21) (sub -21) km/s Mp/c, where the uncertainties correspond to statistical followed by systematic at 68% confidence. The best-fit H(sub 0) is 57 km/s Mp/c for an open (Omega(sub M) = 0.3) universe and 52 km/s Mp/c for a flat (Omega(sub M) = 1) universe.

  12. Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

    PubMed Central

    Snijders, Tom A.B.; Steglich, Christian E.G.

    2014-01-01

    Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578

  13. Modeling envelope statistics of blood and myocardium for segmentation of echocardiographic images.

    PubMed

    Nillesen, Maartje M; Lopata, Richard G P; Gerrits, Inge H; Kapusta, Livia; Thijssen, Johan M; de Korte, Chris L

    2008-04-01

    The objective of this study was to investigate the use of speckle statistics as a preprocessing step for segmentation of the myocardium in echocardiographic images. Three-dimensional (3D) and biplane image sequences of the left ventricle of two healthy children and one dog (beagle) were acquired. Pixel-based speckle statistics of manually segmented blood and myocardial regions were investigated by fitting various probability density functions (pdf). The statistics of heart muscle and blood could both be optimally modeled by a K-pdf or Gamma-pdf (Kolmogorov-Smirnov goodness-of-fit test). Scale and shape parameters of both distributions could differentiate between blood and myocardium. Local estimation of these parameters was used to obtain parametric images, where window size was related to speckle size (5 x 2 speckles). Moment-based and maximum-likelihood estimators were used. Scale parameters were still able to differentiate blood from myocardium; however, smoothing of edges of anatomical structures occurred. Estimation of the shape parameter required a larger window size, leading to unacceptable blurring. Using these parameters as an input for segmentation resulted in unreliable segmentation. Adaptive mean squares filtering was then introduced using the moment-based scale parameter (sigma(2)/mu) of the Gamma-pdf to automatically steer the two-dimensional (2D) local filtering process. This method adequately preserved sharpness of the edges. In conclusion, a trade-off between preservation of sharpness of edges and goodness-of-fit when estimating local shape and scale parameters is evident for parametric images. For this reason, adaptive filtering outperforms parametric imaging for the segmentation of echocardiographic images.

  14. A Statistical Atrioventricular Node Model Accounting for Pathway Switching During Atrial Fibrillation.

    PubMed

    Henriksson, Mikael; Corino, Valentina D A; Sornmo, Leif; Sandberg, Frida

    2016-09-01

    The atrioventricular (AV) node plays a central role in atrial fibrillation (AF), as it influences the conduction of impulses from the atria into the ventricles. In this paper, the statistical dual pathway AV node model, previously introduced by us, is modified so that it accounts for atrial impulse pathway switching even if the preceding impulse did not cause a ventricular activation. The proposed change in model structure implies that the number of model parameters subjected to maximum likelihood estimation is reduced from five to four. The model is evaluated using the data acquired in the RATe control in atrial fibrillation (RATAF) study, involving 24-h ECG recordings from 60 patients with permanent AF. When fitting the models to the RATAF database, similar results were obtained for both the present and the previous model, with a median fit of 86%. The results show that the parameter estimates characterizing refractory period prolongation exhibit considerably lower variation when using the present model, a finding that may be ascribed to fewer model parameters. The new model maintains the capability to model RR intervals, while providing more reliable parameters estimates. The model parameters are expected to convey novel clinical information, and may be useful for predicting the effect of rate control drugs.

  15. Spatial uncertainty of a geoid undulation model in Guayaquil, Ecuador

    NASA Astrophysics Data System (ADS)

    Chicaiza, E. G.; Leiva, C. A.; Arranz, J. J.; Buenańo, X. E.

    2017-06-01

    Geostatistics is a discipline that deals with the statistical analysis of regionalized variables. In this case study, geostatistics is used to estimate geoid undulation in the rural area of Guayaquil town in Ecuador. The geostatistical approach was chosen because the estimation error of prediction map is getting. Open source statistical software R and mainly geoR, gstat and RGeostats libraries were used. Exploratory data analysis (EDA), trend and structural analysis were carried out. An automatic model fitting by Iterative Least Squares and other fitting procedures were employed to fit the variogram. Finally, Kriging using gravity anomaly of Bouguer as external drift and Universal Kriging were used to get a detailed map of geoid undulation. The estimation uncertainty was reached in the interval [-0.5; +0.5] m for errors and a maximum estimation standard deviation of 2 mm in relation with the method of interpolation applied. The error distribution of the geoid undulation map obtained in this study provides a better result than Earth gravitational models publicly available for the study area according the comparison with independent validation points. The main goal of this paper is to confirm the feasibility to use geoid undulations from Global Navigation Satellite Systems and leveling field measurements and geostatistical techniques methods in order to use them in high-accuracy engineering projects.

  16. Cliffs versus plains: Can ROSINA/COPS and OSIRIS data of comet 67P/Churyumov-Gerasimenko in autumn 2014 constrain inhomogeneous outgassing?

    NASA Astrophysics Data System (ADS)

    Marschall, R.; Mottola, S.; Su, C. C.; Liao, Y.; Rubin, M.; Wu, J. S.; Thomas, N.; Altwegg, K.; Sierks, H.; Ip, W.-H.; Keller, H. U.; Knollenberg, J.; Kührt, E.; Lai, I. L.; Skorov, Y.; Jorda, L.; Preusker, F.; Scholten, F.; Vincent, J.-B.; Osiris Team; Rosina Team

    2017-09-01

    Context. This paper describes the modelling of gas and dust data acquired in the period August to October 2014 from the European Space Agency's Rosetta spacecraft when it was in close proximity to the nucleus of comet 67P/Churyumov-Gerasimenko. Aims: With our 3D gas and dust comae models this work attempts to test the hypothesis that cliff activity on comet 67P/Churyumov-Gerasimenko can solely account for the local gas density data observed by the Rosetta Orbiter Spectrometer for Ion and Neutral Analysis (ROSINA) and the dust brightnesses seen by the Optical, Spectroscopic, and Infrared Remote Imaging System (OSIRIS) in the considered time span. Methods: The model uses a previously developed shape model of the nucleus. From this, the water sublimation rates and gas temperatures at the surface are computed. The gas expansion is modelled with a 3D Direct Simulation Monte Carlo algorithm. A dust drag algorithm is then used to compute dust volume number densities in the coma, which are then converted to brightnesses using Mie theory and a line-of-sight integration. Furthermore we have studied the impact of topographic re-radiation on the models. Results: We show that gas activity from only cliff areas produces a fit to the ROSINA/COPS data that is as statistically good as a purely insolation-driven model. In contrast, pure cliff activity does not reproduce the dust brightness observed by OSIRIS and can thus be ruled out. On the other hand, gas activity from the Hapi region in addition to cliff activity produces a statistically better fit to the ROSINA/COPS data than purely insolation-driven outgassing and also fits the OSIRIS observations rather well. We found that topographic re-radiation does not contribute significantly to the sublimation behaviour of H2O but plays an important role in how the gas flux interacts with the irregular shape of the nucleus. Conclusions: We demonstrate that fits to the observations are non-unique. We can conclude however that gas and dust activity from cliffs and the Hapi region are consistent with the ROSINA/COPS and OSIRIS data sets for the considered time span and are thus a plausible solution. Models with activity from low gravitational slopes alone provide a statistically inferior solution.

  17. Acute Diarrheal Syndromic Surveillance

    PubMed Central

    Kam, H.J.; Choi, S.; Cho, J.P.; Min, Y.G.; Park, R.W.

    2010-01-01

    Objective In an effort to identify and characterize the environmental factors that affect the number of patients with acute diarrheal (AD) syndrome, we developed and tested two regional surveillance models including holiday and weather information in addition to visitor records, at emergency medical facilities in the Seoul metropolitan area of Korea. Methods With 1,328,686 emergency department visitor records from the National Emergency Department Information system (NEDIS) and the holiday and weather information, two seasonal ARIMA models were constructed: (1) The simple model (only with total patient number), (2) the environmental factor-added model. The stationary R-squared was utilized as an in-sample model goodness-of-fit statistic for the constructed models, and the cumulative mean of the Mean Absolute Percentage Error (MAPE) was used to measure post-sample forecast accuracy over the next 1 month. Results The (1,0,1)(0,1,1)7 ARIMA model resulted in an adequate model fit for the daily number of AD patient visits over 12 months for both cases. Among various features, the total number of patient visits was selected as a commonly influential independent variable. Additionally, for the environmental factor-added model, holidays and daily precipitation were selected as features that statistically significantly affected model fitting. Stationary R-squared values were changed in a range of 0.651-0.828 (simple), and 0.805-0.844 (environmental factor-added) with p<0.05. In terms of prediction, the MAPE values changed within 0.090-0.120 and 0.089-0.114, respectively. Conclusion The environmental factor-added model yielded better MAPE values. Holiday and weather information appear to be crucial for the construction of an accurate syndromic surveillance model for AD, in addition to the visitor and assessment records. PMID:23616829

  18. Parametric modelling of cost data in medical studies.

    PubMed

    Nixon, R M; Thompson, S G

    2004-04-30

    The cost of medical resources used is often recorded for each patient in clinical studies in order to inform decision-making. Although cost data are generally skewed to the right, interest is in making inferences about the population mean cost. Common methods for non-normal data, such as data transformation, assuming asymptotic normality of the sample mean or non-parametric bootstrapping, are not ideal. This paper describes possible parametric models for analysing cost data. Four example data sets are considered, which have different sample sizes and degrees of skewness. Normal, gamma, log-normal, and log-logistic distributions are fitted, together with three-parameter versions of the latter three distributions. Maximum likelihood estimates of the population mean are found; confidence intervals are derived by a parametric BC(a) bootstrap and checked by MCMC methods. Differences between model fits and inferences are explored.Skewed parametric distributions fit cost data better than the normal distribution, and should in principle be preferred for estimating the population mean cost. However for some data sets, we find that models that fit badly can give similar inferences to those that fit well. Conversely, particularly when sample sizes are not large, different parametric models that fit the data equally well can lead to substantially different inferences. We conclude that inferences are sensitive to choice of statistical model, which itself can remain uncertain unless there is enough data to model the tail of the distribution accurately. Investigating the sensitivity of conclusions to choice of model should thus be an essential component of analysing cost data in practice. Copyright 2004 John Wiley & Sons, Ltd.

  19. Assessing posttraumatic stress disorder's latent structure in elderly bereaved European trauma survivors: evidence for a five-factor dysphoric and anxious arousal model.

    PubMed

    Armour, Cherie; O'Connor, Maja; Elklit, Ask; Elhai, Jon D

    2013-10-01

    The three-factor structure of posttraumatic stress disorder (PTSD) specified by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, is not supported in the empirical literature. Two alternative four-factor models have received a wealth of empirical support. However, a consensus regarding which is superior has not been reached. A recent five-factor model has been shown to provide superior fit over the existing four-factor models. The present study investigated the fit of the five-factor model against the existing four-factor models and assessed the resultant factors' association with depression in a bereaved European trauma sample (N = 325). The participants were assessed for PTSD via the Harvard Trauma Questionnaire and depression via the Beck Depression Inventory. The five-factor model provided superior fit to the data compared with the existing four-factor models. In the dysphoric arousal model, depression was equally related to both dysphoric arousal and emotional numbing, whereas depression was more related to dysphoric arousal than to anxious arousal.

  20. Computational algebraic geometry for statistical modeling FY09Q2 progress.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, David C.; Rojas, Joseph Maurice; Pebay, Philippe Pierre

    2009-03-01

    This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones inmore » more detail; the next section provides an overview of the project and how the current progress fits into it.« less

  1. Computing Science and Statistics: Proceedings of the Symposium on the Interface: Computationally Intensive Methods in Statistics (20th) Held in Fairfax, Virginia on April 20-23, 1988

    DTIC Science & Technology

    1989-03-15

    essence of the idea ycessible mtho forunrtandig eth- Tis tand thP ra) rm guh ide propet oaes nd d of e aessie meh bsd fooesadng asymptoti- isthe for s...network? This of Such empirical parametric model fitting is of course depends heavily on the class of net- course the essence of much of applied...smaller problems is the essence of graphical modeling. A model hy- attributes. Let e be the discrete joint outcome space for those N pergraph, g

  2. Statistical Models for Averaging of the Pump–Probe Traces: Example of Denoising in Terahertz Time-Domain Spectroscopy

    NASA Astrophysics Data System (ADS)

    Skorobogatiy, Maksim; Sadasivan, Jayesh; Guerboukha, Hichem

    2018-05-01

    In this paper, we first discuss the main types of noise in a typical pump-probe system, and then focus specifically on terahertz time domain spectroscopy (THz-TDS) setups. We then introduce four statistical models for the noisy pulses obtained in such systems, and detail rigorous mathematical algorithms to de-noise such traces, find the proper averages and characterise various types of experimental noise. Finally, we perform a comparative analysis of the performance, advantages and limitations of the algorithms by testing them on the experimental data collected using a particular THz-TDS system available in our laboratories. We conclude that using advanced statistical models for trace averaging results in the fitting errors that are significantly smaller than those obtained when only a simple statistical average is used.

  3. Statistical significance estimation of a signal within the GooFit framework on GPUs

    NASA Astrophysics Data System (ADS)

    Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis

    2017-03-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  4. A Bayesian statistical analysis of mouse dermal tumor promotion assay data for evaluating cigarette smoke condensate.

    PubMed

    Kathman, Steven J; Potts, Ryan J; Ayres, Paul H; Harp, Paul R; Wilson, Cody L; Garner, Charles D

    2010-10-01

    The mouse dermal assay has long been used to assess the dermal tumorigenicity of cigarette smoke condensate (CSC). This mouse skin model has been developed for use in carcinogenicity testing utilizing the SENCAR mouse as the standard strain. Though the model has limitations, it remains as the most relevant method available to study the dermal tumor promoting potential of mainstream cigarette smoke. In the typical SENCAR mouse CSC bioassay, CSC is applied for 29 weeks following the application of a tumor initiator such as 7,12-dimethylbenz[a]anthracene (DMBA). Several endpoints are considered for analysis including: the percentage of animals with at least one mass, latency, and number of masses per animal. In this paper, a relatively straightforward analytic model and procedure is presented for analyzing the time course of the incidence of masses. The procedure considered here takes advantage of Bayesian statistical techniques, which provide powerful methods for model fitting and simulation. Two datasets are analyzed to illustrate how the model fits the data, how well the model may perform in predicting data from such trials, and how the model may be used as a decision tool when comparing the dermal tumorigenicity of cigarette smoke condensate from multiple cigarette types. The analysis presented here was developed as a statistical decision tool for differentiating between two or more prototype products based on the dermal tumorigenicity. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  5. Guessing and the Rasch Model

    ERIC Educational Resources Information Center

    Holster, Trevor A.; Lake, J.

    2016-01-01

    Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

  6. A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

    ERIC Educational Resources Information Center

    Leite, Walter L.

    2007-01-01

    Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…

  7. Survival modeling for the estimation of transition probabilities in model-based economic evaluations in the absence of individual patient data: a tutorial.

    PubMed

    Diaby, Vakaramoko; Adunlin, Georges; Montero, Alberto J

    2014-02-01

    Survival modeling techniques are increasingly being used as part of decision modeling for health economic evaluations. As many models are available, it is imperative for interested readers to know about the steps in selecting and using the most suitable ones. The objective of this paper is to propose a tutorial for the application of appropriate survival modeling techniques to estimate transition probabilities, for use in model-based economic evaluations, in the absence of individual patient data (IPD). An illustration of the use of the tutorial is provided based on the final progression-free survival (PFS) analysis of the BOLERO-2 trial in metastatic breast cancer (mBC). An algorithm was adopted from Guyot and colleagues, and was then run in the statistical package R to reconstruct IPD, based on the final PFS analysis of the BOLERO-2 trial. It should be emphasized that the reconstructed IPD represent an approximation of the original data. Afterwards, we fitted parametric models to the reconstructed IPD in the statistical package Stata. Both statistical and graphical tests were conducted to verify the relative and absolute validity of the findings. Finally, the equations for transition probabilities were derived using the general equation for transition probabilities used in model-based economic evaluations, and the parameters were estimated from fitted distributions. The results of the application of the tutorial suggest that the log-logistic model best fits the reconstructed data from the latest published Kaplan-Meier (KM) curves of the BOLERO-2 trial. Results from the regression analyses were confirmed graphically. An equation for transition probabilities was obtained for each arm of the BOLERO-2 trial. In this paper, a tutorial was proposed and used to estimate the transition probabilities for model-based economic evaluation, based on the results of the final PFS analysis of the BOLERO-2 trial in mBC. The results of our study can serve as a basis for any model (Markov) that needs the parameterization of transition probabilities, and only has summary KM plots available.

  8. Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

    PubMed Central

    Marcum, Christopher Steven; Butts, Carter T.

    2015-01-01

    The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

  9. Physical and cognitive fitness in young adulthood and risk of amyotrophic lateral sclerosis at an early age.

    PubMed

    Longinetti, E; Mariosa, D; Larsson, H; Almqvist, C; Lichtenstein, P; Ye, W; Fang, F

    2017-01-01

    There is a clinical impression that patients with amyotrophic lateral sclerosis (ALS) have a higher level of physical fitness and lower body mass index (BMI) than average. However, there is a lack of literature examining the relationship between cognitive fitness and ALS risk. In this study we explored the associations of both physical and cognitive fitness with future risk of ALS. Data on physical fitness, BMI, intelligence quotient (IQ) and stress resilience were collected from 1 838 376 Swedish men aged 17-20 years at conscription during 1968-2010. Their subsequent ALS diagnoses were identified through the Swedish Patient Register. Hazard ratios (HRs) and 95% CIs from flexible parametric models were used to assess age-specific associations of physical fitness, BMI, IQ and stress resilience with ALS. We identified 439 incident ALS cases during follow-up (mean age at diagnosis: 48 years). Individuals with physical fitness above the highest tertile tended to have a higher risk of ALS before the age of 45 years (range of HRs: 1.42-1.75; statistically significant associations at age 41-43 years) compared with others. Individuals with BMI ≥ 25 tended to have a lower risk of ALS at all ages (range of HRs: 0.42-0.80; statistically significant associations at age 42-48 years) compared with those with BMI < 25. Individuals with IQ above the highest tertile had a statistically significantly increased risk of ALS at an age of 56 years and above (range of HRs: 1.33-1.81), whereas individuals with stress resilience above the highest tertile had a lower risk of ALS at an age of 55 years and below (range of HRs: 0.47-0.73). Physical fitness, BMI, IQ and stress resilience in young adulthood might be associated with the development of ALS at an early age. © 2016 EAN.

  10. Endoscopic third ventriculostomy in the treatment of childhood hydrocephalus.

    PubMed

    Kulkarni, Abhaya V; Drake, James M; Mallucci, Conor L; Sgouros, Spyros; Roth, Jonathan; Constantini, Shlomi

    2009-08-01

    To develop a model to predict the probability of endoscopic third ventriculostomy (ETV) success in the treatment for hydrocephalus on the basis of a child's individual characteristics. We analyzed 618 ETVs performed consecutively on children at 12 international institutions to identify predictors of ETV success at 6 months. A multivariable logistic regression model was developed on 70% of the dataset (training set) and validated on 30% of the dataset (validation set). In the training set, 305/455 ETVs (67.0%) were successful. The regression model (containing patient age, cause of hydrocephalus, and previous cerebrospinal fluid shunt) demonstrated good fit (Hosmer-Lemeshow, P = .78) and discrimination (C statistic = 0.70). In the validation set, 105/163 ETVs (64.4%) were successful and the model maintained good fit (Hosmer-Lemeshow, P = .45), discrimination (C statistic = 0.68), and calibration (calibration slope = 0.88). A simplified ETV Success Score was devised that closely approximates the predicted probability of ETV success. Children most likely to succeed with ETV can now be accurately identified and spared the long-term complications of CSF shunting.

  11. Bayesian investigation of isochrone consistency using the old open cluster NGC 188

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hills, Shane; Courteau, Stéphane; Von Hippel, Ted

    2015-03-01

    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities thatmore » enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.« less

  12. Process air quality data

    NASA Technical Reports Server (NTRS)

    Butler, C. M.; Hogge, J. E.

    1978-01-01

    Air quality sampling was conducted. Data for air quality parameters, recorded on written forms, punched cards or magnetic tape, are available for 1972 through 1975. Computer software was developed to (1) calculate several daily statistical measures of location, (2) plot time histories of data or the calculated daily statistics, (3) calculate simple correlation coefficients, and (4) plot scatter diagrams. Computer software was developed for processing air quality data to include time series analysis and goodness of fit tests. Computer software was developed to (1) calculate a larger number of daily statistical measures of location, and a number of daily monthly and yearly measures of location, dispersion, skewness and kurtosis, (2) decompose the extended time series model and (3) perform some goodness of fit tests. The computer program is described, documented and illustrated by examples. Recommendations are made for continuation of the development of research on processing air quality data.

  13. Modeling Antimicrobial Activity of Clorox(R) Using an Agar-Diffusion Test: A New Twist On an Old Experiment.

    ERIC Educational Resources Information Center

    Mitchell, James K.; Carter, William E.

    2000-01-01

    Describes using a computer statistical software package called Minitab to model the sensitivity of several microbes to the disinfectant NaOCl (Clorox') using the Kirby-Bauer technique. Each group of students collects data from one microbe, conducts regression analyses, then chooses the best-fit model based on the highest r-values obtained.…

  14. Assessing Fit of Models with Discrete Proficiency Variable in Educational Assessment. Research Report. RR-04-07

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Almond, Russell; Yan, Duanli

    2004-01-01

    Model checking is a crucial part of any statistical analysis. As educators tie models for testing to cognitive theory of the domains, there is a natural tendency to represent participant proficiencies with latent variables representing the presence or absence of the knowledge, skills, and proficiencies to be tested (Mislevy, Almond, Yan, &…

  15. Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet: comparison of simple and advanced statistical techniques

    NASA Astrophysics Data System (ADS)

    Pollard, David; Chang, Won; Haran, Murali; Applegate, Patrick; DeConto, Robert

    2016-05-01

    A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ˜ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.

  16. Deep space network software cost estimation model

    NASA Technical Reports Server (NTRS)

    Tausworthe, R. C.

    1981-01-01

    A parametric software cost estimation model prepared for Jet PRopulsion Laboratory (JPL) Deep Space Network (DSN) Data System implementation tasks is described. The resource estimation mdel modifies and combines a number of existing models. The model calibrates the task magnitude and difficulty, development environment, and software technology effects through prompted responses to a set of approximately 50 questions. Parameters in the model are adjusted to fit JPL software life-cycle statistics.

  17. Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2017-01-01

    Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

  18. Physical fitness modulates incidental but not intentional statistical learning of simultaneous auditory sequences during concurrent physical exercise.

    PubMed

    Daikoku, Tatsuya; Takahashi, Yuji; Futagami, Hiroko; Tarumoto, Nagayoshi; Yasuda, Hideki

    2017-02-01

    In real-world auditory environments, humans are exposed to overlapping auditory information such as those made by human voices and musical instruments even during routine physical activities such as walking and cycling. The present study investigated how concurrent physical exercise affects performance of incidental and intentional learning of overlapping auditory streams, and whether physical fitness modulates the performances of learning. Participants were grouped with 11 participants with lower and higher fitness each, based on their Vo 2 max value. They were presented simultaneous auditory sequences with a distinct statistical regularity each other (i.e. statistical learning), while they were pedaling on the bike and seating on a bike at rest. In experiment 1, they were instructed to attend to one of the two sequences and ignore to the other sequence. In experiment 2, they were instructed to attend to both of the two sequences. After exposure to the sequences, learning effects were evaluated by familiarity test. In the experiment 1, performance of statistical learning of ignored sequences during concurrent pedaling could be higher in the participants with high than low physical fitness, whereas in attended sequence, there was no significant difference in performance of statistical learning between high than low physical fitness. Furthermore, there was no significant effect of physical fitness on learning while resting. In the experiment 2, the both participants with high and low physical fitness could perform intentional statistical learning of two simultaneous sequences in the both exercise and rest sessions. The improvement in physical fitness might facilitate incidental but not intentional statistical learning of simultaneous auditory sequences during concurrent physical exercise.

  19. Modeling dust emission in the Magellanic Clouds with Spitzer and Herschel

    NASA Astrophysics Data System (ADS)

    Chastenet, Jérémy; Bot, Caroline; Gordon, Karl D.; Bocchio, Marco; Roman-Duval, Julia; Jones, Anthony P.; Ysard, Nathalie

    2017-05-01

    Context. Dust modeling is crucial to infer dust properties and budget for galaxy studies. However, there are systematic disparities between dust grain models that result in corresponding systematic differences in the inferred dust properties of galaxies. Quantifying these systematics requires a consistent fitting analysis. Aims: We compare the output dust parameters and assess the differences between two dust grain models, the DustEM model and THEMIS. In this study, we use a single fitting method applied to all the models to extract a coherent and unique statistical analysis. Methods: We fit the models to the dust emission seen by Spitzer and Herschel in the Small and Large Magellanic Clouds (SMC and LMC). The observations cover the infrared (IR) spectrum from a few microns to the sub-millimeter range. For each fitted pixel, we calculate the full n-D likelihood based on a previously described method. The free parameters are both environmental (U, the interstellar radiation field strength; αISRF, power-law coefficient for a multi-U environment; Ω∗, the starlight strength) and intrinsic to the model (YI: abundances of the grain species I; αsCM20, coefficient in the small carbon grain size distribution). Results: Fractional residuals of five different sets of parameters show that fitting THEMIS brings a more accurate reproduction of the observations than the DustEM model. However, independent variations of the dust species show strong model-dependencies. We find that the abundance of silicates can only be constrained to an upper-limit and that the silicate/carbon ratio is different than that seen in our Galaxy. In the LMC, our fits result in dust masses slightly lower than those found in the literature, by a factor lower than 2. In the SMC, we find dust masses in agreement with previous studies.

  20. Toward a Model-Based Approach to the Clinical Assessment of Personality Psychopathology

    PubMed Central

    Eaton, Nicholas R.; Krueger, Robert F.; Docherty, Anna R.; Sponheim, Scott R.

    2015-01-01

    Recent years have witnessed tremendous growth in the scope and sophistication of statistical methods available to explore the latent structure of psychopathology, involving continuous, discrete, and hybrid latent variables. The availability of such methods has fostered optimism that they can facilitate movement from classification primarily crafted through expert consensus to classification derived from empirically-based models of psychopathological variation. The explication of diagnostic constructs with empirically supported structures can then facilitate the development of assessment tools that appropriately characterize these constructs. Our goal in this paper is to illustrate how new statistical methods can inform conceptualization of personality psychopathology and therefore its assessment. We use magical thinking as example, because both theory and earlier empirical work suggested the possibility of discrete aspects to the latent structure of personality psychopathology, particularly forms of psychopathology involving distortions of reality testing, yet other data suggest that personality psychopathology is generally continuous in nature. We directly compared the fit of a variety of latent variable models to magical thinking data from a sample enriched with clinically significant variation in psychotic symptomatology for explanatory purposes. Findings generally suggested a continuous latent variable model best represented magical thinking, but results varied somewhat depending on different indices of model fit. We discuss the implications of the findings for classification and applied personality assessment. We also highlight some limitations of this type of approach that are illustrated by these data, including the importance of substantive interpretation, in addition to use of model fit indices, when evaluating competing structural models. PMID:24007309

  1. Estimation and prediction of maximum daily rainfall at Sagar Island using best fit probability models

    NASA Astrophysics Data System (ADS)

    Mandal, S.; Choudhury, B. U.

    2015-07-01

    Sagar Island, setting on the continental shelf of Bay of Bengal, is one of the most vulnerable deltas to the occurrence of extreme rainfall-driven climatic hazards. Information on probability of occurrence of maximum daily rainfall will be useful in devising risk management for sustaining rainfed agrarian economy vis-a-vis food and livelihood security. Using six probability distribution models and long-term (1982-2010) daily rainfall data, we studied the probability of occurrence of annual, seasonal and monthly maximum daily rainfall (MDR) in the island. To select the best fit distribution models for annual, seasonal and monthly time series based on maximum rank with minimum value of test statistics, three statistical goodness of fit tests, viz. Kolmogorove-Smirnov test (K-S), Anderson Darling test ( A 2 ) and Chi-Square test ( X 2) were employed. The fourth probability distribution was identified from the highest overall score obtained from the three goodness of fit tests. Results revealed that normal probability distribution was best fitted for annual, post-monsoon and summer seasons MDR, while Lognormal, Weibull and Pearson 5 were best fitted for pre-monsoon, monsoon and winter seasons, respectively. The estimated annual MDR were 50, 69, 86, 106 and 114 mm for return periods of 2, 5, 10, 20 and 25 years, respectively. The probability of getting an annual MDR of >50, >100, >150, >200 and >250 mm were estimated as 99, 85, 40, 12 and 03 % level of exceedance, respectively. The monsoon, summer and winter seasons exhibited comparatively higher probabilities (78 to 85 %) for MDR of >100 mm and moderate probabilities (37 to 46 %) for >150 mm. For different recurrence intervals, the percent probability of MDR varied widely across intra- and inter-annual periods. In the island, rainfall anomaly can pose a climatic threat to the sustainability of agricultural production and thus needs adequate adaptation and mitigation measures.

  2. Impact of Gender on Pharmocokinetics of Intranasal Scopolamine

    NASA Technical Reports Server (NTRS)

    Putcha, L.; Lei, Wu.; S-L Chow, Diana

    2013-01-01

    Introduction: An intranasal gel dosage formulation of scopolamine (INSCOP) was developed for the treatment of Space Motion Sickness (SMS), which is commonly experienced by astronauts during space missions. The bioavailability and pharmacokinetics (PK) were evaluated under IND guidelines. Since information is lacking on the effect of gender on the PK of Scopolamine, we examined gender differences in PK parameters of INSCOP at three dose levels of 0.1, 0.2 and 0.4 mg. Methods: Plasma scopolamine concentrations as a function of time data were collected from twelve normal healthy human subjects (6 male/6 female) who participated in a fully randomized double blind crossover study. The PK parameters were derived using WinNonlin. Covariate analysis of PK profiles was performed using NONMEN and statistically compared using a likelihood ratio test on the difference of objective function value (OFV). Statistical significance for covariate analysis was set at P<0.05(?OFV=3.84). Results: No significant difference in PK parameters between male and female subjects was observed with 0.1 and 0.2 mg doses. However, CL and Vd were significantly different between male and female subjects at the 0.4 mg dose. Results from population covariate modeling analysis indicate that a onecompartment PK model with first-order elimination rate offers best fit for describing INSCOP concentration-time profiles. The inclusion of sex as a covariate enhanced the model fitting (?OFV=-4.1) owing to the genderdependent CL and Vd differences after the 0.4 mg dose. Conclusion: Statistical modeling of scopolamine concentration-time data suggests gender-dependent pharmacokinetics of scopolamine at the high dose level of 0.4 mg. Clearance of the parent compound was significantly faster and the volume of distribution was significantly higher in males than in females, As a result, including gender as a covariate to the pharmacokinetic model of scopolamine offers the best fit for PK modeling of the drug at dose of 0.4 mg or higher.

  3. MPTinR: analysis of multinomial processing tree models in R.

    PubMed

    Singmann, Henrik; Kellen, David

    2013-06-01

    We introduce MPTinR, a software package developed for the analysis of multinomial processing tree (MPT) models. MPT models represent a prominent class of cognitive measurement models for categorical data with applications in a wide variety of fields. MPTinR is the first software for the analysis of MPT models in the statistical programming language R, providing a modeling framework that is more flexible than standalone software packages. MPTinR also introduces important features such as (1) the ability to calculate the Fisher information approximation measure of model complexity for MPT models, (2) the ability to fit models for categorical data outside the MPT model class, such as signal detection models, (3) a function for model selection across a set of nested and nonnested candidate models (using several model selection indices), and (4) multicore fitting. MPTinR is available from the Comprehensive R Archive Network at http://cran.r-project.org/web/packages/MPTinR/ .

  4. NEUTRON STAR MASS–RADIUS CONSTRAINTS USING EVOLUTIONARY OPTIMIZATION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, A. L.; Morsink, S. M.; Fiege, J. D.

    The equation of state of cold supra-nuclear-density matter, such as in neutron stars, is an open question in astrophysics. A promising method for constraining the neutron star equation of state is modeling pulse profiles of thermonuclear X-ray burst oscillations from hot spots on accreting neutron stars. The pulse profiles, constructed using spherical and oblate neutron star models, are comparable to what would be observed by a next-generation X-ray timing instrument like ASTROSAT , NICER , or a mission similar to LOFT . In this paper, we showcase the use of an evolutionary optimization algorithm to fit pulse profiles to determinemore » the best-fit masses and radii. By fitting synthetic data, we assess how well the optimization algorithm can recover the input parameters. Multiple Poisson realizations of the synthetic pulse profiles, constructed with 1.6 million counts and no background, were fitted with the Ferret algorithm to analyze both statistical and degeneracy-related uncertainty and to explore how the goodness of fit depends on the input parameters. For the regions of parameter space sampled by our tests, the best-determined parameter is the projected velocity of the spot along the observer’s line of sight, with an accuracy of ≤3% compared to the true value and with ≤5% statistical uncertainty. The next best determined are the mass and radius; for a neutron star with a spin frequency of 600 Hz, the best-fit mass and radius are accurate to ≤5%, with respective uncertainties of ≤7% and ≤10%. The accuracy and precision depend on the observer inclination and spot colatitude, with values of ∼1% achievable in mass and radius if both the inclination and colatitude are ≳60°.« less

  5. Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

    PubMed

    Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

    2018-03-01

    This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

  6. Independent and joint effects of sedentary time and cardiorespiratory fitness on all-cause mortality: the Cooper Center Longitudinal Study

    PubMed Central

    Shuval, Kerem; Finley, Carrie E; Barlow, Carolyn E; Nguyen, Binh T; Njike, Valentine Y; Pettee Gabriel, Kelley

    2015-01-01

    Objectives To examine the independent and joint effects of sedentary time and cardiorespiratory fitness (fitness) on all-cause mortality. Design, setting, participants A prospective study of 3141 Cooper Center Longitudinal Study participants. Participants provided information on television (TV) viewing and car time in 1982 and completed a maximal exercise test during a 1-year time frame; they were then followed until mortality or through 2010. TV viewing, car time, total sedentary time and fitness were the primary exposures and all-cause mortality was the outcome. The relationship between the exposures and outcome was examined utilising Cox proportional hazard models. Results A total of 581 deaths occurred over a median follow-up period of 28.7 years (SD=4.4). At baseline, participants’ mean age was 45.0 years (SD=9.6), 86.5% were men and their mean body mass index was 24.6 (SD=3.0). Multivariable analyses revealed a significant linear relationship between increased fitness and lower mortality risk, even while adjusting for total sedentary time and covariates (p=0.02). The effects of total sedentary time on increased mortality risk did not quite reach statistical significance once fitness and covariates were adjusted for (p=0.05). When examining this relationship categorically, in comparison to the reference category (≤10 h/week), being sedentary for ≥23 h weekly increased mortality risk by 29% without controlling for fitness (HR=1.29, 95% CI 1.03 to 1.63); however, once fitness and covariates were taken into account this relationship did not reach statistical significance (HR=1.20, 95% CI 0.95 to 1.51). Moreover, spending >10 h in the car weekly significantly increased mortality risk by 27% in the fully adjusted model. The association between TV viewing and mortality was not significant. Conclusions The relationship between total sedentary time and higher mortality risk is less pronounced when fitness is taken into account. Increased car time, but not TV viewing, is significantly related to higher mortality risk, even when taking fitness into account, in this cohort. PMID:26525421

  7. Evaluation of Statistical Methods for Modeling Historical Resource Production and Forecasting

    NASA Astrophysics Data System (ADS)

    Nanzad, Bolorchimeg

    This master's thesis project consists of two parts. Part I of the project compares modeling of historical resource production and forecasting of future production trends using the logit/probit transform advocated by Rutledge (2011) with conventional Hubbert curve fitting, using global coal production as a case study. The conventional Hubbert/Gaussian method fits a curve to historical production data whereas a logit/probit transform uses a linear fit to a subset of transformed production data. Within the errors and limitations inherent in this type of statistical modeling, these methods provide comparable results. That is, despite that apparent goodness-of-fit achievable using the Logit/Probit methodology, neither approach provides a significant advantage over the other in either explaining the observed data or in making future projections. For mature production regions, those that have already substantially passed peak production, results obtained by either method are closely comparable and reasonable, and estimates of ultimately recoverable resources obtained by either method are consistent with geologically estimated reserves. In contrast, for immature regions, estimates of ultimately recoverable resources generated by either of these alternative methods are unstable and thus, need to be used with caution. Although the logit/probit transform generates high quality-of-fit correspondence with historical production data, this approach provides no new information compared to conventional Gaussian or Hubbert-type models and may have the effect of masking the noise and/or instability in the data and the derived fits. In particular, production forecasts for immature or marginally mature production systems based on either method need to be regarded with considerable caution. Part II of the project investigates the utility of a novel alternative method for multicyclic Hubbert modeling tentatively termed "cycle-jumping" wherein overlap of multiple cycles is limited. The model is designed in a way that each cycle is described by the same three parameters as conventional multicyclic Hubbert model and every two cycles are connected with a transition width. Transition width indicates the shift from one cycle to the next and is described as weighted coaddition of neighboring two cycles. It is determined by three parameters: transition year, transition width, and gamma parameter for weighting. The cycle-jumping method provides superior model compared to the conventional multicyclic Hubbert model and reflects historical production behavior more reasonably and practically, by better modeling of the effects of technological transitions and socioeconomic factors that affect historical resource production behavior by explicitly considering the form of the transitions between production cycles.

  8. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  9. The log-periodic-AR(1)-GARCH(1,1) model for financial crashes

    NASA Astrophysics Data System (ADS)

    Gazola, L.; Fernandes, C.; Pizzinga, A.; Riera, R.

    2008-02-01

    This paper intends to meet recent claims for the attainment of more rigorous statistical methodology within the econophysics literature. To this end, we consider an econometric approach to investigate the outcomes of the log-periodic model of price movements, which has been largely used to forecast financial crashes. In order to accomplish reliable statistical inference for unknown parameters, we incorporate an autoregressive dynamic and a conditional heteroskedasticity structure in the error term of the original model, yielding the log-periodic-AR(1)-GARCH(1,1) model. Both the original and the extended models are fitted to financial indices of U. S. market, namely S&P500 and NASDAQ. Our analysis reveal two main points: (i) the log-periodic-AR(1)-GARCH(1,1) model has residuals with better statistical properties and (ii) the estimation of the parameter concerning the time of the financial crash has been improved.

  10. Testing Einstein's gravity and dark energy with growth of matter perturbations: Indications for new physics?

    NASA Astrophysics Data System (ADS)

    Basilakos, Spyros; Nesseris, Savvas

    2016-12-01

    The growth index of matter fluctuations is computed for ten distinct accelerating cosmological models and confronted by the latest growth-rate data via a two-step process. First, we implement a joint statistical analysis in order to place constraints on the free parameters of all models using solely background data. Second, using the observed growth rate of clustering from various galaxy surveys we test the performance of the current cosmological models at the perturbation level while either marginalizing over σ8 or having it as a free parameter. As a result, we find that at a statistical level, i.e., after considering the best-fit χ2 or the value of the Akaike information criterion, most models are in very good agreement with the growth-rate data and are practically indistinguishable from Λ CDM . However, when we also consider the internal consistency of the models by comparing the theoretically predicted values of (γ0,γ1), i.e., the value of the growth index γ (z ) and its derivative today, with the best-fit ones, we find that the predictions of three out of ten dark energy models are in mild tension with the best-fit ones when σ8 is marginalized over. When σ8 is free we find that most models are not only in mild tension, but also predict low values for σ8. This could be attributed to either a systematic problem with the growth-rate data or the emergence of new physics at low redshifts, with the latter possibly being related to the well-known issue of the lack of power at small scales. Finally, by utilizing mock data based on an large synoptic survey telescope-like survey we show that with future surveys and by using the growth index parametrization, it will be possible to resolve the issue of the low σ8 but also the tension between the fitted and theoretically predicted values of (γ0,γ1).

  11. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  12. A BRDF statistical model applying to space target materials modeling

    NASA Astrophysics Data System (ADS)

    Liu, Chenghao; Li, Zhi; Xu, Can; Tian, Qichen

    2017-10-01

    In order to solve the problem of poor effect in modeling the large density BRDF measured data with five-parameter semi-empirical model, a refined statistical model of BRDF which is suitable for multi-class space target material modeling were proposed. The refined model improved the Torrance-Sparrow model while having the modeling advantages of five-parameter model. Compared with the existing empirical model, the model contains six simple parameters, which can approximate the roughness distribution of the material surface, can approximate the intensity of the Fresnel reflectance phenomenon and the attenuation of the reflected light's brightness with the azimuth angle changes. The model is able to achieve parameter inversion quickly with no extra loss of accuracy. The genetic algorithm was used to invert the parameters of 11 different samples in the space target commonly used materials, and the fitting errors of all materials were below 6%, which were much lower than those of five-parameter model. The effect of the refined model is verified by comparing the fitting results of the three samples at different incident zenith angles in 0° azimuth angle. Finally, the three-dimensional modeling visualizations of these samples in the upper hemisphere space was given, in which the strength of the optical scattering of different materials could be clearly shown. It proved the good describing ability of the refined model at the material characterization as well.

  13. How alive is constrained SUSY really?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bechtle, Philip; Desch, Klaus; Dreiner, Herbert K.

    2016-05-31

    Constrained supersymmetric models like the CMSSM might look less attractive nowadays because of fine tuning arguments. They also might look less probable in terms of Bayesian statistics. The question how well the model under study describes the data, however, is answered by frequentist p-values. Thus, for the first time, we calculate a p-value for a supersymmetric model by performing dedicated global toy fits. We combine constraints from low-energy and astrophysical observables, Higgs boson mass and rate measurements as well as the non-observation of new physics in searches for supersymmetry at the LHC. Furthermore, using the framework Fittino, we perform globalmore » fits of the CMSSM to the toy data and find that this model is excluded at the 90% confidence level.« less

  14. Riding across the selection landscape: fitness consequences of annual variation in reproductive characteristics

    PubMed Central

    Tremblay, Raymond L.; Ackerman, James D.; Pérez, Maria-Eglée

    2010-01-01

    Evolutionary models estimating phenotypic selection in character size usually assume that the character is invariant across reproductive bouts. We show that variation in the size of reproductive traits may be large over multiple events and can influence fitness in organisms where these traits are produced anew each season. With data from populations of two orchid species, Caladenia valida and Tolumnia variegata, we used Bayesian statistics to investigate the effect on the distribution in fitness of individuals when the fitness landscape is not flat and when characters vary across reproductive bouts. Inconsistency in character size across reproductive periods within an individual increases the uncertainty of mean fitness and, consequently, the uncertainty in individual fitness. The trajectory of selection is likely to be muddled as a consequence of variation in morphology of individuals across reproductive bouts. The frequency and amplitude of such changes will certainly affect the dynamics between selection and genetic drift. PMID:20047875

  15. The Use of Statistical Process Control-Charts for Person-Fit Analysis on Computerized Adaptive Testing. LSAC Research Report Series.

    ERIC Educational Resources Information Center

    Meijer, Rob R.; van Krimpen-Stoop, Edith M. L. A.

    In this study a cumulative-sum (CUSUM) procedure from the theory of Statistical Process Control was modified and applied in the context of person-fit analysis in a computerized adaptive testing (CAT) environment. Six person-fit statistics were proposed using the CUSUM procedure, and three of them could be used to investigate the CAT in online test…

  16. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    NASA Astrophysics Data System (ADS)

    Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan

    2017-11-01

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.

  17. Regression-Based Norms for a Bi-factor Model for Scoring the Brief Test of Adult Cognition by Telephone (BTACT).

    PubMed

    Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E

    2015-05-01

    The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Development of a Self-Report Physical Function Instrument for Disability Assessment: Item Pool Construction and Factor Analysis

    PubMed Central

    McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.

    2014-01-01

    Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402

  19. Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis.

    PubMed

    McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K

    2013-09-01

    To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  20. Model-Free Estimation of Tuning Curves and Their Attentional Modulation, Based on Sparse and Noisy Data.

    PubMed

    Helmer, Markus; Kozyrev, Vladislav; Stephan, Valeska; Treue, Stefan; Geisel, Theo; Battaglia, Demian

    2016-01-01

    Tuning curves are the functions that relate the responses of sensory neurons to various values within one continuous stimulus dimension (such as the orientation of a bar in the visual domain or the frequency of a tone in the auditory domain). They are commonly determined by fitting a model e.g. a Gaussian or other bell-shaped curves to the measured responses to a small subset of discrete stimuli in the relevant dimension. However, as neuronal responses are irregular and experimental measurements noisy, it is often difficult to determine reliably the appropriate model from the data. We illustrate this general problem by fitting diverse models to representative recordings from area MT in rhesus monkey visual cortex during multiple attentional tasks involving complex composite stimuli. We find that all models can be well-fitted, that the best model generally varies between neurons and that statistical comparisons between neuronal responses across different experimental conditions are affected quantitatively and qualitatively by specific model choices. As a robust alternative to an often arbitrary model selection, we introduce a model-free approach, in which features of interest are extracted directly from the measured response data without the need of fitting any model. In our attentional datasets, we demonstrate that data-driven methods provide descriptions of tuning curve features such as preferred stimulus direction or attentional gain modulations which are in agreement with fit-based approaches when a good fit exists. Furthermore, these methods naturally extend to the frequent cases of uncertain model selection. We show that model-free approaches can identify attentional modulation patterns, such as general alterations of the irregular shape of tuning curves, which cannot be captured by fitting stereotyped conventional models. Finally, by comparing datasets across different conditions, we demonstrate effects of attention that are cell- and even stimulus-specific. Based on these proofs-of-concept, we conclude that our data-driven methods can reliably extract relevant tuning information from neuronal recordings, including cells whose seemingly haphazard response curves defy conventional fitting approaches.

  1. Model-Free Estimation of Tuning Curves and Their Attentional Modulation, Based on Sparse and Noisy Data

    PubMed Central

    Helmer, Markus; Kozyrev, Vladislav; Stephan, Valeska; Treue, Stefan; Geisel, Theo; Battaglia, Demian

    2016-01-01

    Tuning curves are the functions that relate the responses of sensory neurons to various values within one continuous stimulus dimension (such as the orientation of a bar in the visual domain or the frequency of a tone in the auditory domain). They are commonly determined by fitting a model e.g. a Gaussian or other bell-shaped curves to the measured responses to a small subset of discrete stimuli in the relevant dimension. However, as neuronal responses are irregular and experimental measurements noisy, it is often difficult to determine reliably the appropriate model from the data. We illustrate this general problem by fitting diverse models to representative recordings from area MT in rhesus monkey visual cortex during multiple attentional tasks involving complex composite stimuli. We find that all models can be well-fitted, that the best model generally varies between neurons and that statistical comparisons between neuronal responses across different experimental conditions are affected quantitatively and qualitatively by specific model choices. As a robust alternative to an often arbitrary model selection, we introduce a model-free approach, in which features of interest are extracted directly from the measured response data without the need of fitting any model. In our attentional datasets, we demonstrate that data-driven methods provide descriptions of tuning curve features such as preferred stimulus direction or attentional gain modulations which are in agreement with fit-based approaches when a good fit exists. Furthermore, these methods naturally extend to the frequent cases of uncertain model selection. We show that model-free approaches can identify attentional modulation patterns, such as general alterations of the irregular shape of tuning curves, which cannot be captured by fitting stereotyped conventional models. Finally, by comparing datasets across different conditions, we demonstrate effects of attention that are cell- and even stimulus-specific. Based on these proofs-of-concept, we conclude that our data-driven methods can reliably extract relevant tuning information from neuronal recordings, including cells whose seemingly haphazard response curves defy conventional fitting approaches. PMID:26785378

  2. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensitymore » that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting, and reliably estimates believable prediction errors. For the 50% of the real data sets fit well by both methods, spline and logistic predictions are practically indistinguishable, varying in accuracy by less than 15%. The spline method may be useful when automated prediction across simultaneous assays of numerous proteins must be applied routinely with minimal user intervention.« less

  3. Estimating times of surgeries with two component procedures: comparison of the lognormal and normal models.

    PubMed

    Strum, David P; May, Jerrold H; Sampson, Allan R; Vargas, Luis G; Spangler, William E

    2003-01-01

    Variability inherent in the duration of surgical procedures complicates surgical scheduling. Modeling the duration and variability of surgeries might improve time estimates. Accurate time estimates are important operationally to improve utilization, reduce costs, and identify surgeries that might be considered outliers. Surgeries with multiple procedures are difficult to model because they are difficult to segment into homogenous groups and because they are performed less frequently than single-procedure surgeries. The authors studied, retrospectively, 10,740 surgeries each with exactly two CPTs and 46,322 surgical cases with only one CPT from a large teaching hospital to determine if the distribution of dual-procedure surgery times fit more closely a lognormal or a normal model. The authors tested model goodness of fit to their data using Shapiro-Wilk tests, studied factors affecting the variability of time estimates, and examined the impact of coding permutations (ordered combinations) on modeling. The Shapiro-Wilk tests indicated that the lognormal model is statistically superior to the normal model for modeling dual-procedure surgeries. Permutations of component codes did not appear to differ significantly with respect to total procedure time and surgical time. To improve individual models for infrequent dual-procedure surgeries, permutations may be reduced and estimates may be based on the longest component procedure and type of anesthesia. The authors recommend use of the lognormal model for estimating surgical times for surgeries with two component procedures. Their results help legitimize the use of log transforms to normalize surgical procedure times prior to hypothesis testing using linear statistical models. Multiple-procedure surgeries may be modeled using the longest (statistically most important) component procedure and type of anesthesia.

  4. Confirmatory factor analysis of the female sexual function index.

    PubMed

    Opperman, Emily A; Benson, Lindsay E; Milhausen, Robin R

    2013-01-01

    The Female Sexual Functioning Index (Rosen et al., 2000 ) was designed to assess the key dimensions of female sexual functioning using six domains: desire, arousal, lubrication, orgasm, satisfaction, and pain. A full-scale score was proposed to represent women's overall sexual function. The fifth revision to the Diagnostic and Statistical Manual (DSM) is currently underway and includes a proposal to combine desire and arousal problems. The objective of this article was to evaluate and compare four models of the Female Sexual Functioning Index: (a) single-factor model, (b) six-factor model, (c) second-order factor model, and (4) five-factor model combining the desire and arousal subscales. Cross-sectional and observational data from 85 women were used to conduct a confirmatory factor analysis on the Female Sexual Functioning Index. Local and global goodness-of-fit measures, the chi-square test of differences, squared multiple correlations, and regression weights were used. The single-factor model fit was not acceptable. The original six-factor model was confirmed, and good model fit was found for the second-order and five-factor models. Delta chi-square tests of differences supported best fit for the six-factor model validating usage of the six domains. However, when revisions are made to the DSM-5, the Female Sexual Functioning Index can adapt to reflect these changes and remain a valid assessment tool for women's sexual functioning, as the five-factor structure was also supported.

  5. Translation of Fit & Strong! for Middle-Aged and Older Adults: Examining Implementation and Effectiveness of a Lay-Led Model in Central Texas

    PubMed Central

    Ory, Marcia G.; Lee, Shinduk; Zollinger, Alyson; Bhurtyal, Kiran; Jiang, Luohua; Smith, Matthew Lee

    2015-01-01

    The Fit & Strong! program is an evidence-based, multi-component program promoting physical activity among older adults, particularly those suffering from lower-extremity osteoarthritis. The primary purpose of the study is to examine if the Fit & Strong! program translated into a lay-leader model can produce comparable outcomes to the original program taught by physical therapists and/or certified exercise instructors. A single-group, pre–post study design was employed, and data were collected at the baseline (n = 136 participants) and the intervention conclusion (n = 71) with both baseline and post-intervention data. The measurements included socio-demographic information, health- and behavior-related information, and health-related quality of life. Various statistical tests were used for the program impact analysis and examination of the association between participant characteristics and program completion. As in the original study, there were statistically significant (p < 0.05) improvements in self-efficacy for exercise, aerobic capacity, joint stiffness, level of energy, and amount and intensity of physical activities. The odds of completing the program were significantly lower for the participants from rural areas and those having multiple chronic conditions. Successful adaptation of the Fit & Strong! program to a lay-leader model can increase the likelihood of program dissemination by broadening the selection pool of instructors and, hence, reducing the potential issue of resource limitation. However, high program attrition rates (54.1%) emphasize the importance of adopting evidence-based strategies for improving the retention of the participants from rural areas and those with multiple chronic conditions. PMID:25964912

  6. A Statistical Approach to Thermal Management of Data Centers Under Steady State and System Perturbations

    PubMed Central

    Haaland, Ben; Min, Wanli; Qian, Peter Z. G.; Amemiya, Yasuo

    2011-01-01

    Temperature control for a large data center is both important and expensive. On the one hand, many of the components produce a great deal of heat, and on the other hand, many of the components require temperatures below a fairly low threshold for reliable operation. A statistical framework is proposed within which the behavior of a large cooling system can be modeled and forecast under both steady state and perturbations. This framework is based upon an extension of multivariate Gaussian autoregressive hidden Markov models (HMMs). The estimated parameters of the fitted model provide useful summaries of the overall behavior of and relationships within the cooling system. Predictions under system perturbations are useful for assessing potential changes and improvements to be made to the system. Many data centers have far more cooling capacity than necessary under sensible circumstances, thus resulting in energy inefficiencies. Using this model, predictions for system behavior after a particular component of the cooling system is shut down or reduced in cooling power can be generated. Steady-state predictions are also useful for facility monitors. System traces outside control boundaries flag a change in behavior to examine. The proposed model is fit to data from a group of air conditioners within an enterprise data center from the IT industry. The fitted model is examined, and a particular unit is found to be underutilized. Predictions generated for the system under the removal of that unit appear very reasonable. Steady-state system behavior also is predicted well. PMID:22076026

  7. Statistical comparison of various interpolation algorithms for reconstructing regional grid ionospheric maps over China

    NASA Astrophysics Data System (ADS)

    Li, Min; Yuan, Yunbin; Wang, Ningbo; Li, Zishen; Liu, Xifeng; Zhang, Xiao

    2018-07-01

    This paper presents a quantitative comparison of several widely used interpolation algorithms, i.e., Ordinary Kriging (OrK), Universal Kriging (UnK), planar fit and Inverse Distance Weighting (IDW), based on a grid-based single-shell ionosphere model over China. The experimental data were collected from the Crustal Movement Observation Network of China (CMONOC) and the International GNSS Service (IGS), covering the days of year 60-90 in 2015. The quality of these interpolation algorithms was assessed by cross-validation in terms of both the ionospheric correction performance and Single-Frequency (SF) Precise Point Positioning (PPP) accuracy on an epoch-by-epoch basis. The results indicate that the interpolation models perform better at mid-latitudes than low latitudes. For the China region, the performance of OrK and UnK is relatively better than the planar fit and IDW model for estimating ionospheric delay and positioning. In addition, the computational efficiencies of the IDW and planar fit models are better than those of OrK and UnK.

  8. Extracting Spurious Latent Classes in Growth Mixture Modeling with Nonnormal Errors

    ERIC Educational Resources Information Center

    Guerra-Peña, Kiero; Steinley, Douglas

    2016-01-01

    Growth mixture modeling is generally used for two purposes: (1) to identify mixtures of normal subgroups and (2) to approximate oddly shaped distributions by a mixture of normal components. Often in applied research this methodology is applied to both of these situations indistinctly: using the same fit statistics and likelihood ratio tests. This…

  9. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization

    ERIC Educational Resources Information Center

    Gelman, Andrew; Lee, Daniel; Guo, Jiqiang

    2015-01-01

    Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…

  10. Specification, testing, and interpretation of gene-by-measured-environment interaction models in the presence of gene-environment correlation

    PubMed Central

    Rathouz, Paul J.; Van Hulle, Carol A.; Lee Rodgers, Joseph; Waldman, Irwin D.; Lahey, Benjamin B.

    2009-01-01

    Purcell (2002) proposed a bivariate biometric model for testing and quantifying the interaction between latent genetic influences and measured environments in the presence of gene-environment correlation. Purcell’s model extends the Cholesky model to include gene-environment interaction. We examine a number of closely-related alternative models that do not involve gene-environment interaction but which may fit the data as well Purcell’s model. Because failure to consider these alternatives could lead to spurious detection of gene-environment interaction, we propose alternative models for testing gene-environment interaction in the presence of gene-environment correlation, including one based on the correlated factors model. In addition, we note mathematical errors in the calculation of effect size via variance components in Purcell’s model. We propose a statistical method for deriving and interpreting variance decompositions that are true to the fitted model. PMID:18293078

  11. Modeling species-abundance relationships in multi-species collections

    USGS Publications Warehouse

    Peng, S.; Yin, Z.; Ren, H.; Guo, Q.

    2003-01-01

    Species-abundance relationship is one of the most fundamental aspects of community ecology. Since Motomura first developed the geometric series model to describe the feature of community structure, ecologists have developed many other models to fit the species-abundance data in communities. These models can be classified into empirical and theoretical ones, including (1) statistical models, i.e., negative binomial distribution (and its extension), log-series distribution (and its extension), geometric distribution, lognormal distribution, Poisson-lognormal distribution, (2) niche models, i.e., geometric series, broken stick, overlapping niche, particulate niche, random assortment, dominance pre-emption, dominance decay, random fraction, weighted random fraction, composite niche, Zipf or Zipf-Mandelbrot model, and (3) dynamic models describing community dynamics and restrictive function of environment on community. These models have different characteristics and fit species-abundance data in various communities or collections. Among them, log-series distribution, lognormal distribution, geometric series, and broken stick model have been most widely used.

  12. Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

    PubMed

    Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong

    2016-01-01

    The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).

  13. Parameter Estimation as a Problem in Statistical Thermodynamics.

    PubMed

    Earle, Keith A; Schneider, David J

    2011-03-14

    In this work, we explore the connections between parameter fitting and statistical thermodynamics using the maxent principle of Jaynes as a starting point. In particular, we show how signal averaging may be described by a suitable one particle partition function, modified for the case of a variable number of particles. These modifications lead to an entropy that is extensive in the number of measurements in the average. Systematic error may be interpreted as a departure from ideal gas behavior. In addition, we show how to combine measurements from different experiments in an unbiased way in order to maximize the entropy of simultaneous parameter fitting. We suggest that fit parameters may be interpreted as generalized coordinates and the forces conjugate to them may be derived from the system partition function. From this perspective, the parameter fitting problem may be interpreted as a process where the system (spectrum) does work against internal stresses (non-optimum model parameters) to achieve a state of minimum free energy/maximum entropy. Finally, we show how the distribution function allows us to define a geometry on parameter space, building on previous work[1, 2]. This geometry has implications for error estimation and we outline a program for incorporating these geometrical insights into an automated parameter fitting algorithm.

  14. Application of the Hyper-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes.

    PubMed

    Khazraee, S Hadi; Sáez-Castillo, Antonio Jose; Geedipally, Srinivas Reddy; Lord, Dominique

    2015-05-01

    The hyper-Poisson distribution can handle both over- and underdispersion, and its generalized linear model formulation allows the dispersion of the distribution to be observation-specific and dependent on model covariates. This study's objective is to examine the potential applicability of a newly proposed generalized linear model framework for the hyper-Poisson distribution in analyzing motor vehicle crash count data. The hyper-Poisson generalized linear model was first fitted to intersection crash data from Toronto, characterized by overdispersion, and then to crash data from railway-highway crossings in Korea, characterized by underdispersion. The results of this study are promising. When fitted to the Toronto data set, the goodness-of-fit measures indicated that the hyper-Poisson model with a variable dispersion parameter provided a statistical fit as good as the traditional negative binomial model. The hyper-Poisson model was also successful in handling the underdispersed data from Korea; the model performed as well as the gamma probability model and the Conway-Maxwell-Poisson model previously developed for the same data set. The advantages of the hyper-Poisson model studied in this article are noteworthy. Unlike the negative binomial model, which has difficulties in handling underdispersed data, the hyper-Poisson model can handle both over- and underdispersed crash data. Although not a major issue for the Conway-Maxwell-Poisson model, the effect of each variable on the expected mean of crashes is easily interpretable in the case of this new model. © 2014 Society for Risk Analysis.

  15. A multicenter mortality prediction model for patients receiving prolonged mechanical ventilation

    PubMed Central

    Carson, Shannon S.; Kahn, Jeremy M.; Hough, Catherine L.; Seeley, Eric J.; White, Douglas B.; Douglas, Ivor S.; Cox, Christopher E.; Caldwell, Ellen; Bangdiwala, Shrikant I.; Garrett, Joanne M.; Rubenfeld, Gordon D.

    2012-01-01

    Objective Significant deficiencies exist in the communication of prognosis for patients requiring prolonged mechanical ventilation after acute illness, in part because of clinician uncertainty about long-term outcomes. We sought to refine a mortality prediction model for patients requiring prolonged ventilation using a multicentered study design. Design Cohort study. Setting Five geographically diverse tertiary care medical centers in the United States (California, Colorado, North Carolina, Pennsylvania, Washington). Patients Two hundred sixty adult patients who received at least 21 days of mechanical ventilation after acute illness. Interventions None. Measurements and Main Results For the probability model, we included age, platelet count, and requirement for vasopressors and/or hemodialysis, each measured on day 21 of mechanical ventilation, in a logistic regression model with 1-yr mortality as the outcome variable. We subsequently modified a simplified prognostic scoring rule (ProVent score) by categorizing the risk variables (age 18–49, 50–64, and >65 yrs; platelet count 0–150 and >150; vasopressors; hemodialysis) in another logistic regression model and assigning points to variables according to β coefficient values. Overall mortality at 1 yr was 48%. The area under the curve of the receiver operator characteristic curve for the primary ProVent probability model was 0.79 (95% confidence interval, 0.75–0.81), and the p value for the Hosmer-Lemeshow goodness-of-fit statistic was .89. The area under the curve for the categorical model was 0.77, and the p value for the goodness-of-fit statistic was .34. The area under the curve for the ProVent score was 0.76, and the p value for the Hosmer-Lemeshow goodness-of-fit statistic was .60. For the 50 patients with a ProVent score >2, only one patient was able to be discharged directly home, and 1-yr mortality was 86%. Conclusion The ProVent probability model is a simple and reproducible model that can accurately identify patients requiring prolonged mechanical ventilation who are at high risk of 1-yr mortality. PMID:22080643

  16. Modelling average maximum daily temperature using r largest order statistics: An application to South African data

    PubMed Central

    2018-01-01

    Natural hazards (events that may cause actual disasters) are established in the literature as major causes of various massive and destructive problems worldwide. The occurrences of earthquakes, floods and heat waves affect millions of people through several impacts. These include cases of hospitalisation, loss of lives and economic challenges. The focus of this study was on the risk reduction of the disasters that occur because of extremely high temperatures and heat waves. Modelling average maximum daily temperature (AMDT) guards against the disaster risk and may also help countries towards preparing for extreme heat. This study discusses the use of the r largest order statistics approach of extreme value theory towards modelling AMDT over the period of 11 years, that is, 2000–2010. A generalised extreme value distribution for r largest order statistics is fitted to the annual maxima. This is performed in an effort to study the behaviour of the r largest order statistics. The method of maximum likelihood is used in estimating the target parameters and the frequency of occurrences of the hottest days is assessed. The study presents a case study of South Africa in which the data for the non-winter season (September–April of each year) are used. The meteorological data used are the AMDT that are collected by the South African Weather Service and provided by Eskom. The estimation of the shape parameter reveals evidence of a Weibull class as an appropriate distribution for modelling AMDT in South Africa. The extreme quantiles for specified return periods are estimated using the quantile function and the best model is chosen through the use of the deviance statistic with the support of the graphical diagnostic tools. The Entropy Difference Test (EDT) is used as a specification test for diagnosing the fit of the models to the data.

  17. Constructing the Japanese version of the Maslach Burnout Inventory-Student Survey: Confirmatory factor analysis.

    PubMed

    Tsubakita, Takashi; Shimazaki, Kazuyo

    2016-01-01

    To examine the factorial validity of the Maslach Burnout Inventory-Student Survey, using a sample of 2061 Japanese university students majoring in the medical and natural sciences (67.9% male, 31.8% female; Mage  = 19.6 years, standard deviation = 1.5). The back-translated scale used unreversed items to assess inefficacy. The inventory's descriptive properties and Cronbach's alphas were calculated using SPSS software. The present authors compared fit indices of the null, one factor, and default three factor models via confirmatory factor analysis with maximum-likelihood estimation using AMOS software, version 21.0. Intercorrelations between exhaustion, cynicism, and inefficacy were relatively higher than in prior studies. Cronbach's alphas were 0.76, 0.85, and 0.78, respectively. Although fit indices of the hypothesized three factor model did not meet the respective criteria, the model demonstrated better fit than did the null and one factor models. The present authors added four paths between error variables within items, but the modified model did not show satisfactory fit. Subsequent analysis revealed that a bi-factor model fit the data better than did the hypothesized or modified three factor models. The Japanese version of the Maslach Burnout Inventory-Student Survey needs minor changes to improve the fit of its three factor model, but the scale as a whole can be used to adequately assess overall academic burnout in Japanese university students. Although the scale was back-translated, two items measuring exhaustion whose expressions overlapped should be modified, and all items measuring inefficacy should be reversed in order to statistically clarify the factorial difference between the scale's three factors. © 2015 The Authors. Japan Journal of Nursing Science © 2015 Japan Academy of Nursing Science.

  18. Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations

    PubMed Central

    Christensen, Karl Bang; Makransky, Guido; Horton, Mike

    2016-01-01

    The assumption of local independence is central to all item response theory (IRT) models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most widely used fit statistic Q3, there are currently no well-documented suggestions of the critical values which should be used to indicate local dependence (LD), and for this reason, a variety of arbitrary rules of thumb are used. In this study, an empirical data example and Monte Carlo simulation were used to investigate the different factors that can influence the null distribution of residual correlations, with the objective of proposing guidelines that researchers and practitioners can follow when making decisions about LD during scale development and validation. A parametric bootstrapping procedure should be implemented in each separate situation to obtain the critical value of LD applicable to the data set, and provide example critical values for a number of data structure situations. The results show that for the Q3 fit statistic, no single critical value is appropriate for all situations, as the percentiles in the empirical null distribution are influenced by the number of items, the sample size, and the number of response categories. Furthermore, the results show that LD should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic. PMID:29881087

  19. Spectral statistics of random geometric graphs

    NASA Astrophysics Data System (ADS)

    Dettmann, C. P.; Georgiou, O.; Knight, G.

    2017-04-01

    We use random matrix theory to study the spectrum of random geometric graphs, a fundamental model of spatial networks. Considering ensembles of random geometric graphs we look at short-range correlations in the level spacings of the spectrum via the nearest-neighbour and next-nearest-neighbour spacing distribution and long-range correlations via the spectral rigidity Δ3 statistic. These correlations in the level spacings give information about localisation of eigenvectors, level of community structure and the level of randomness within the networks. We find a parameter-dependent transition between Poisson and Gaussian orthogonal ensemble statistics. That is the spectral statistics of spatial random geometric graphs fits the universality of random matrix theory found in other models such as Erdős-Rényi, Barabási-Albert and Watts-Strogatz random graphs.

  20. Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Meiyun; Long, Shibing, E-mail: longshibing@ime.ac.cn; Wang, Guoming

    2014-11-10

    The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electronmore » transport model. Our work provides indications for the improvement of the switching uniformity.« less

  1. A new framework for estimating return levels using regional frequency analysis

    NASA Astrophysics Data System (ADS)

    Winter, Hugo; Bernardara, Pietro; Clegg, Georgina

    2017-04-01

    We propose a new framework for incorporating more spatial and temporal information into the estimation of extreme return levels. Currently, most studies use extreme value models applied to data from a single site; an approach which is inefficient statistically and leads to return level estimates that are less physically realistic. We aim to highlight the benefits that could be obtained by using methodology based upon regional frequency analysis as opposed to classic single site extreme value analysis. This motivates a shift in thinking, which permits the evaluation of local and regional effects and makes use of the wide variety of data that are now available on high temporal and spatial resolutions. The recent winter storms over the UK during the winters of 2013-14 and 2015-16, which have caused wide-ranging disruption and damaged important infrastructure, provide the main motivation for the current work. One of the most impactful natural hazards is flooding, which is often initiated by extreme precipitation. In this presentation, we focus on extreme rainfall, but shall discuss other meteorological variables alongside potentially damaging hazard combinations. To understand the risks posed by extreme precipitation, we need reliable statistical models which can be used to estimate quantities such as the T-year return level, i.e. the level which is expected to be exceeded once every T-years. Extreme value theory provides the main collection of statistical models that can be used to estimate the risks posed by extreme precipitation events. Broadly, at a single site, a statistical model is fitted to exceedances of a high threshold and the model is used to extrapolate to levels beyond the range of the observed data. However, when we have data at many sites over a spatial domain, fitting a separate model for each separate site makes little sense and it would be better if we could incorporate all this information to improve the reliability of return level estimates. Here, we use the regional frequency analysis approach to define homogeneous regions which are affected by the same storms. Extreme value models are then fitted to the data pooled from across a region. We find that this approach leads to more spatially consistent return level estimates with reduced uncertainty bounds.

  2. How many spectral lines are statistically significant?

    NASA Astrophysics Data System (ADS)

    Freund, J.

    When experimental line spectra are fitted with least squares techniques one frequently does not know whether n or n + 1 lines may be fitted safely. This paper shows how an F-test can be applied in order to determine the statistical significance of including an extra line into the fitting routine.

  3. Rare beneficial mutations can halt Muller's ratchet

    NASA Astrophysics Data System (ADS)

    Balick, Daniel; Goyal, Sidhartha; Jerison, Elizabeth; Neher, Richard; Shraiman, Boris; Desai, Michael

    2012-02-01

    In viral, bacterial, and other asexual populations, the vast majority of non-neutral mutations are deleterious. This motivates the application of models without beneficial mutations. Here we show that the presence of surprisingly few compensatory mutations halts fitness decay in these models. Production of deleterious mutations is balanced by purifying selection, stabilizing the fitness distribution. However, stochastic vanishing of fitness classes can lead to slow fitness decay (i.e. Muller's ratchet). For weakly deleterious mutations, production overwhelms purification, rapidly decreasing population fitness. We show that when beneficial mutations are introduced, a stable steady state emerges in the form of a dynamic mutation-selection balance. We argue this state is generic for all mutation rates and population sizes, and is reached as an end state as genomes become saturated by either beneficial or deleterious mutations. Assuming all mutations have the same magnitude selective effect, we calculate the fraction of beneficial mutations necessary to maintain the dynamic balance. This may explain the unexpected maintenance of asexual genomes, as in mitochondria, in the presence of selection. This will affect in the statistics of genetic diversity in these populations.

  4. Invariance in the recurrence of large returns and the validation of models of price dynamics

    NASA Astrophysics Data System (ADS)

    Chang, Lo-Bin; Geman, Stuart; Hsieh, Fushing; Hwang, Chii-Ruey

    2013-08-01

    Starting from a robust, nonparametric definition of large returns (“excursions”), we study the statistics of their occurrences, focusing on the recurrence process. The empirical waiting-time distribution between excursions is remarkably invariant to year, stock, and scale (return interval). This invariance is related to self-similarity of the marginal distributions of returns, but the excursion waiting-time distribution is a function of the entire return process and not just its univariate probabilities. Generalized autoregressive conditional heteroskedasticity (GARCH) models, market-time transformations based on volume or trades, and generalized (Lévy) random-walk models all fail to fit the statistical structure of excursions.

  5. Application of zero-inflated poisson mixed models in prognostic factors of hepatitis C.

    PubMed

    Akbarzadeh Baghban, Alireza; Pourhoseingholi, Asma; Zayeri, Farid; Jafari, Ali Akbar; Alavian, Seyed Moayed

    2013-01-01

    In recent years, hepatitis C virus (HCV) infection represents a major public health problem. Evaluation of risk factors is one of the solutions which help protect people from the infection. This study aims to employ zero-inflated Poisson mixed models to evaluate prognostic factors of hepatitis C. The data was collected from a longitudinal study during 2005-2010. First, mixed Poisson regression (PR) model was fitted to the data. Then, a mixed zero-inflated Poisson model was fitted with compound Poisson random effects. For evaluating the performance of the proposed mixed model, standard errors of estimators were compared. The results obtained from mixed PR showed that genotype 3 and treatment protocol were statistically significant. Results of zero-inflated Poisson mixed model showed that age, sex, genotypes 2 and 3, the treatment protocol, and having risk factors had significant effects on viral load of HCV patients. Of these two models, the estimators of zero-inflated Poisson mixed model had the minimum standard errors. The results showed that a mixed zero-inflated Poisson model was the almost best fit. The proposed model can capture serial dependence, additional overdispersion, and excess zeros in the longitudinal count data.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gecow, Andrzej

    On the way to simulating adaptive evolution of complex system describing a living object or human developed project, a fitness should be defined on node states or network external outputs. Feedbacks lead to circular attractors of these states or outputs which make it difficult to define a fitness. The main statistical effects of adaptive condition are the result of small change tendency and to appear, they only need a statistically correct size of damage initiated by evolutionary change of system. This observation allows to cut loops of feedbacks and in effect to obtain a particular statistically correct state instead ofmore » a long circular attractor which in the quenched model is expected for chaotic network with feedback. Defining fitness on such states is simple. We calculate only damaged nodes and only once. Such an algorithm is optimal for investigation of damage spreading i.e. statistical connections of structural parameters of initial change with the size of effected damage. It is a reversed-annealed method--function and states (signals) may be randomly substituted but connections are important and are preserved. The small damages important for adaptive evolution are correctly depicted in comparison to Derrida annealed approximation which expects equilibrium levels for large networks. The algorithm indicates these levels correctly. The relevant program in Pascal, which executes the algorithm for a wide range of parameters, can be obtained from the author.« less

  7. AMS-02 fits dark matter

    NASA Astrophysics Data System (ADS)

    Balázs, Csaba; Li, Tong

    2016-05-01

    In this work we perform a comprehensive statistical analysis of the AMS-02 electron, positron fluxes and the antiproton-to-proton ratio in the context of a simplified dark matter model. We include known, standard astrophysical sources and a dark matter component in the cosmic ray injection spectra. To predict the AMS-02 observables we use propagation parameters extracted from observed fluxes of heavier nuclei and the low energy part of the AMS-02 data. We assume that the dark matter particle is a Majorana fermion coupling to third generation fermions via a spin-0 mediator, and annihilating to multiple channels at once. The simultaneous presence of various annihilation channels provides the dark matter model with additional flexibility, and this enables us to simultaneously fit all cosmic ray spectra using a simple particle physics model and coherent astrophysical assumptions. Our results indicate that AMS-02 observations are not only consistent with the dark matter hypothesis within the uncertainties, but adding a dark matter contribution improves the fit to the data. Assuming, however, that dark matter is solely responsible for this improvement of the fit, it is difficult to evade the latest CMB limits in this model.

  8. Exact computation of the maximum-entropy potential of spiking neural-network models.

    PubMed

    Cofré, R; Cessac, B

    2014-05-01

    Understanding how stimuli and synaptic connectivity influence the statistics of spike patterns in neural networks is a central question in computational neuroscience. The maximum-entropy approach has been successfully used to characterize the statistical response of simultaneously recorded spiking neurons responding to stimuli. However, in spite of good performance in terms of prediction, the fitting parameters do not explain the underlying mechanistic causes of the observed correlations. On the other hand, mathematical models of spiking neurons (neuromimetic models) provide a probabilistic mapping between the stimulus, network architecture, and spike patterns in terms of conditional probabilities. In this paper we build an exact analytical mapping between neuromimetic and maximum-entropy models.

  9. Direct atomic force microscopy observation of DNA tile crystal growth at the single-molecule level.

    PubMed

    Evans, Constantine G; Hariadi, Rizal F; Winfree, Erik

    2012-06-27

    While the theoretical implications of models of DNA tile self-assembly have been extensively researched and such models have been used to design DNA tile systems for use in experiments, there has been little research testing the fundamental assumptions of those models. In this paper, we use direct observation of individual tile attachments and detachments of two DNA tile systems on a mica surface imaged with an atomic force microscope (AFM) to compile statistics of tile attachments and detachments. We show that these statistics fit the widely used kinetic Tile Assembly Model and demonstrate AFM movies as a viable technique for directly investigating DNA tile systems during growth rather than after assembly.

  10. On the fit of models to covariances and methodology to the Bulletin.

    PubMed

    Bentler, P M

    1992-11-01

    It is noted that 7 of the 10 top-cited articles in the Psychological Bulletin deal with methodological topics. One of these is the Bentler-Bonett (1980) article on the assessment of fit in covariance structure models. Some context is provided on the popularity of this article. In addition, a citation study of methodology articles appearing in the Bulletin since 1978 was carried out. It verified that publications in design, evaluation, measurement, and statistics continue to be important to psychological research. Some thoughts are offered on the role of the journal in making developments in these areas more accessible to psychologists.

  11. The relationship between offspring size and fitness: integrating theory and empiricism.

    PubMed

    Rollinson, Njal; Hutchings, Jeffrey A

    2013-02-01

    How parents divide the energy available for reproduction between size and number of offspring has a profound effect on parental reproductive success. Theory indicates that the relationship between offspring size and offspring fitness is of fundamental importance to the evolution of parental reproductive strategies: this relationship predicts the optimal division of resources between size and number of offspring, it describes the fitness consequences for parents that deviate from optimality, and its shape can predict the most viable type of investment strategy in a given environment (e.g., conservative vs. diversified bet-hedging). Many previous attempts to estimate this relationship and the corresponding value of optimal offspring size have been frustrated by a lack of integration between theory and empiricism. In the present study, we draw from C. Smith and S. Fretwell's classic model to explain how a sound estimate of the offspring size--fitness relationship can be derived with empirical data. We evaluate what measures of fitness can be used to model the offspring size--fitness curve and optimal size, as well as which statistical models should and should not be used to estimate offspring size--fitness relationships. To construct the fitness curve, we recommend that offspring fitness be measured as survival up to the age at which the instantaneous rate of offspring mortality becomes random with respect to initial investment. Parental fitness is then expressed in ecologically meaningful, theoretically defensible, and broadly comparable units: the number of offspring surviving to independence. Although logistic and asymptotic regression have been widely used to estimate offspring size-fitness relationships, the former provides relatively unreliable estimates of optimal size when offspring survival and sample sizes are low, and the latter is unreliable under all conditions. We recommend that the Weibull-1 model be used to estimate this curve because it provides modest improvements in prediction accuracy under experimentally relevant conditions.

  12. Estimating errors in least-squares fitting

    NASA Technical Reports Server (NTRS)

    Richter, P. H.

    1995-01-01

    While least-squares fitting procedures are commonly used in data analysis and are extensively discussed in the literature devoted to this subject, the proper assessment of errors resulting from such fits has received relatively little attention. The present work considers statistical errors in the fitted parameters, as well as in the values of the fitted function itself, resulting from random errors in the data. Expressions are derived for the standard error of the fit, as a function of the independent variable, for the general nonlinear and linear fitting problems. Additionally, closed-form expressions are derived for some examples commonly encountered in the scientific and engineering fields, namely ordinary polynomial and Gaussian fitting functions. These results have direct application to the assessment of the antenna gain and system temperature characteristics, in addition to a broad range of problems in data analysis. The effects of the nature of the data and the choice of fitting function on the ability to accurately model the system under study are discussed, and some general rules are deduced to assist workers intent on maximizing the amount of information obtained form a given set of measurements.

  13. Modeling stock return distributions with a quantum harmonic oscillator

    NASA Astrophysics Data System (ADS)

    Ahn, K.; Choi, M. Y.; Dai, B.; Sohn, S.; Yang, B.

    2017-11-01

    We propose a quantum harmonic oscillator as a model for the market force which draws a stock return from short-run fluctuations to the long-run equilibrium. The stochastic equation governing our model is transformed into a Schrödinger equation, the solution of which features “quantized” eigenfunctions. Consequently, stock returns follow a mixed χ distribution, which describes Gaussian and non-Gaussian features. Analyzing the Financial Times Stock Exchange (FTSE) All Share Index, we demonstrate that our model outperforms traditional stochastic process models, e.g., the geometric Brownian motion and the Heston model, with smaller fitting errors and better goodness-of-fit statistics. In addition, making use of analogy, we provide an economic rationale of the physics concepts such as the eigenstate, eigenenergy, and angular frequency, which sheds light on the relationship between finance and econophysics literature.

  14. Optimization Methods in Sherpa

    NASA Astrophysics Data System (ADS)

    Siemiginowska, Aneta; Nguyen, Dan T.; Doe, Stephen M.; Refsdal, Brian L.

    2009-09-01

    Forward fitting is a standard technique used to model X-ray data. A statistic, usually assumed weighted chi^2 or Poisson likelihood (e.g. Cash), is minimized in the fitting process to obtain a set of the best model parameters. Astronomical models often have complex forms with many parameters that can be correlated (e.g. an absorbed power law). Minimization is not trivial in such setting, as the statistical parameter space becomes multimodal and finding the global minimum is hard. Standard minimization algorithms can be found in many libraries of scientific functions, but they are usually focused on specific functions. However, Sherpa designed as general fitting and modeling application requires very robust optimization methods that can be applied to variety of astronomical data (X-ray spectra, images, timing, optical data etc.). We developed several optimization algorithms in Sherpa targeting a wide range of minimization problems. Two local minimization methods were built: Levenberg-Marquardt algorithm was obtained from MINPACK subroutine LMDIF and modified to achieve the required robustness; and Nelder-Mead simplex method has been implemented in-house based on variations of the algorithm described in the literature. A global search Monte-Carlo method has been implemented following a differential evolution algorithm presented by Storn and Price (1997). We will present the methods in Sherpa and discuss their usage cases. We will focus on the application to Chandra data showing both 1D and 2D examples. This work is supported by NASA contract NAS8-03060 (CXC).

  15. Statistical modeling of storm-level Kp occurrences

    USGS Publications Warehouse

    Remick, K.J.; Love, J.J.

    2006-01-01

    We consider the statistical modeling of the occurrence in time of large Kp magnetic storms as a Poisson process, testing whether or not relatively rare, large Kp events can be considered to arise from a stochastic, sequential, and memoryless process. For a Poisson process, the wait times between successive events occur statistically with an exponential density function. Fitting an exponential function to the durations between successive large Kp events forms the basis of our analysis. Defining these wait times by calculating the differences between times when Kp exceeds a certain value, such as Kp ??? 5, we find the wait-time distribution is not exponential. Because large storms often have several periods with large Kp values, their occurrence in time is not memoryless; short duration wait times are not independent of each other and are often clumped together in time. If we remove same-storm large Kp occurrences, the resulting wait times are very nearly exponentially distributed and the storm arrival process can be characterized as Poisson. Fittings are performed on wait time data for Kp ??? 5, 6, 7, and 8. The mean wait times between storms exceeding such Kp thresholds are 7.12, 16.55, 42.22, and 121.40 days respectively.

  16. Factor structure and psychometric properties of the english version of the trier inventory for chronic stress (TICS-E).

    PubMed

    Petrowski, Katja; Kliem, Sören; Sadler, Michael; Meuret, Alicia E; Ritz, Thomas; Brähler, Elmar

    2018-02-06

    Demands placed on individuals in occupational and social settings, as well as imbalances in personal traits and resources, can lead to chronic stress. The Trier Inventory for Chronic Stress (TICS) measures chronic stress while incorporating domain-specific aspects, and has been found to be a highly reliable and valid research tool. The aims of the present study were to confirm the German version TICS factorial structure in an English translation of the instrument (TICS-E) and to report its psychometric properties. A random route sample of healthy participants (N = 483) aged 18-30 years completed the TICS-E. The robust maximum likelihood estimation with a mean-adjusted chi-square test statistic was applied due to the sample's significant deviation from the multivariate normal distribution. Goodness of fit, absolute model fit, and relative model fit were assessed by means of the root mean square error of approximation (RMSEA), the Comparative Fit Index (CFI) and the Tucker Lewis Index (TLI). Reliability estimates (Cronbach's α and adjusted split-half reliability) ranged from .84 to .92. Item-scale correlations ranged from .50 to .85. Measures of fit showed values of .052 for RMSEA (Cl = 0.50-.054) and .067 for SRMR for absolute model fit, and values of .846 (TLI) and .855 (CFI) for relative model-fit. Factor loadings ranged from .55 to .91. The psychometric properties and factor structure of the TICS-E are comparable to the German version of the TICS. The instrument therefore meets quality standards for an adequate measurement of chronic stress.

  17. How do physicians become medical experts? A test of three competing theories: distinct domains, independent influence and encapsulation models.

    PubMed

    Violato, Claudio; Gao, Hong; O'Brien, Mary Claire; Grier, David; Shen, E

    2018-05-01

    The distinction between basic sciences and clinical knowledge which has led to a theoretical debate on how medical expertise is developed has implications for medical school and lifelong medical education. This longitudinal, population based observational study was conducted to test the fit of three theories-knowledge encapsulation, independent influence, distinct domains-of the development of medical expertise employing structural equation modelling. Data were collected from 548 physicians (292 men-53.3%; 256 women-46.7%; mean age = 24.2 years on admission) who had graduated from medical school 2009-2014. They included (1) Admissions data of undergraduate grade point average and Medical College Admission Test sub-test scores, (2) Course performance data from years 1, 2, and 3 of medical school, and (3) Performance on the NBME exams (i.e., Step 1, Step 2 CK, and Step 3). Statistical fit indices (Goodness of Fit Index-GFI; standardized root mean squared residual-SRMR; root mean squared error of approximation-RSMEA) and comparative fit [Formula: see text] of three theories of cognitive development of medical expertise were used to assess model fit. There is support for the knowledge encapsulation three factor model of clinical competency (GFI = 0.973, SRMR = 0.043, RSMEA = 0.063) which had superior fit indices to both the independent influence and distinct domains theories ([Formula: see text] vs [Formula: see text] [[Formula: see text

  18. GIA Model Statistics for GRACE Hydrology, Cryosphere, and Ocean Science

    NASA Astrophysics Data System (ADS)

    Caron, L.; Ivins, E. R.; Larour, E.; Adhikari, S.; Nilsson, J.; Blewitt, G.

    2018-03-01

    We provide a new analysis of glacial isostatic adjustment (GIA) with the goal of assembling the model uncertainty statistics required for rigorously extracting trends in surface mass from the Gravity Recovery and Climate Experiment (GRACE) mission. Such statistics are essential for deciphering sea level, ocean mass, and hydrological changes because the latter signals can be relatively small (≤2 mm/yr water height equivalent) over very large regions, such as major ocean basins and watersheds. With abundant new >7 year continuous measurements of vertical land motion (VLM) reported by Global Positioning System stations on bedrock and new relative sea level records, our new statistical evaluation of GIA uncertainties incorporates Bayesian methodologies. A unique aspect of the method is that both the ice history and 1-D Earth structure vary through a total of 128,000 forward models. We find that best fit models poorly capture the statistical inferences needed to correctly invert for lower mantle viscosity and that GIA uncertainty exceeds the uncertainty ascribed to trends from 14 years of GRACE data in polar regions.

  19. Radar cross section models for limited aspect angle windows

    NASA Astrophysics Data System (ADS)

    Robinson, Mark C.

    1992-12-01

    This thesis presents a method for building Radar Cross Section (RCS) models of aircraft based on static data taken from limited aspect angle windows. These models statistically characterize static RCS. This is done to show that a limited number of samples can be used to effectively characterize static aircraft RCS. The optimum models are determined by performing both a Kolmogorov and a Chi-Square goodness-of-fit test comparing the static RCS data with a variety of probability density functions (pdf) that are known to be effective at approximating the static RCS of aircraft. The optimum parameter estimator is also determined by the goodness of-fit tests if there is a difference in pdf parameters obtained by the Maximum Likelihood Estimator (MLE) and the Method of Moments (MoM) estimators.

  20. Performance of Reclassification Statistics in Comparing Risk Prediction Models

    PubMed Central

    Paynter, Nina P.

    2012-01-01

    Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information. PMID:21294152

  1. The Extended Erlang-Truncated Exponential distribution: Properties and application to rainfall data.

    PubMed

    Okorie, I E; Akpanta, A C; Ohakwe, J; Chikezie, D C

    2017-06-01

    The Erlang-Truncated Exponential ETE distribution is modified and the new lifetime distribution is called the Extended Erlang-Truncated Exponential EETE distribution. Some statistical and reliability properties of the new distribution are given and the method of maximum likelihood estimate was proposed for estimating the model parameters. The usefulness and flexibility of the EETE distribution was illustrated with an uncensored data set and its fit was compared with that of the ETE and three other three-parameter distributions. Results based on the minimized log-likelihood ([Formula: see text]), Akaike information criterion (AIC), Bayesian information criterion (BIC) and the generalized Cramér-von Mises [Formula: see text] statistics shows that the EETE distribution provides a more reasonable fit than the one based on the other competing distributions.

  2. Statistical modeling of ecosystem respiration using eddy covariance data: Maximum likelihood parameter estimation, and Monte Carlo simulation of model and parameter uncertainty, applied to three simple models

    Treesearch

    Andrew D. Richardson; David Y. Hollinger; David Y. Hollinger

    2005-01-01

    Whether the goal is to fill gaps in the flux record, or to extract physiological parameters from eddy covariance data, researchers are frequently interested in fitting simple models of ecosystem physiology to measured data. Presently, there is no consensus on the best models to use, or the ideal optimization criteria. We demonstrate that, given our estimates of the...

  3. Energy Savings Analysis for Energy Monitoring and Control Systems

    DTIC Science & Technology

    1995-01-01

    for evaluating design and construction a:-0 quality, and for studying the effectiveness of air - tightening AC retrofits. No simple relationship...Energy These models of residential infiltration are based on statistical "Resource Center (1983) include information on air tightening in fits of

  4. Improving the efficiency of the cardiac catheterization laboratories through understanding the stochastic behavior of the scheduled procedures.

    PubMed

    Stepaniak, Pieter S; Soliman Hamad, Mohamed A; Dekker, Lukas R C; Koolen, Jacques J

    2014-01-01

    In this study, we sought to analyze the stochastic behavior of Catherization Laboratories (Cath Labs) procedures in our institution. Statistical models may help to improve estimated case durations to support management in the cost-effective use of expensive surgical resources. We retrospectively analyzed all the procedures performed in the Cath Labs in 2012. The duration of procedures is strictly positive (larger than zero) and has mostly a large minimum duration. Because of the strictly positive character of the Cath Lab procedures, a fit of a lognormal model may be desirable. Having a minimum duration requires an estimate of the threshold (shift) parameter of the lognormal model. Therefore, the 3-parameter lognormal model is interesting. To avoid heterogeneous groups of observations, we tested every group-cardiologist-procedure combination for the normal, 2- and 3-parameter lognormal distribution. The total number of elective and emergency procedures performed was 6,393 (8,186 h). The final analysis included 6,135 procedures (7,779 h). Electrophysiology (intervention) procedures fit the 3-parameter lognormal model 86.1% (80.1%). Using Friedman test statistics, we conclude that the 3-parameter lognormal model is superior to the 2-parameter lognormal model. Furthermore, the 2-parameter lognormal is superior to the normal model. Cath Lab procedures are well-modelled by lognormal models. This information helps to improve and to refine Cath Lab schedules and hence their efficient use.

  5. Finite mixture model: A maximum likelihood estimation approach on time series data

    NASA Astrophysics Data System (ADS)

    Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad

    2014-09-01

    Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.

  6. Three-dimensional whole-brain perfusion quantification using pseudo-continuous arterial spin labeling MRI at multiple post-labeling delays: accounting for both arterial transit time and impulse response function.

    PubMed

    Qin, Qin; Huang, Alan J; Hua, Jun; Desmond, John E; Stevens, Robert D; van Zijl, Peter C M

    2014-02-01

    Measurement of the cerebral blood flow (CBF) with whole-brain coverage is challenging in terms of both acquisition and quantitative analysis. In order to fit arterial spin labeling-based perfusion kinetic curves, an empirical three-parameter model which characterizes the effective impulse response function (IRF) is introduced, which allows the determination of CBF, the arterial transit time (ATT) and T(1,eff). The accuracy and precision of the proposed model were compared with those of more complicated models with four or five parameters through Monte Carlo simulations. Pseudo-continuous arterial spin labeling images were acquired on a clinical 3-T scanner in 10 normal volunteers using a three-dimensional multi-shot gradient and spin echo scheme at multiple post-labeling delays to sample the kinetic curves. Voxel-wise fitting was performed using the three-parameter model and other models that contain two, four or five unknown parameters. For the two-parameter model, T(1,eff) values close to tissue and blood were assumed separately. Standard statistical analysis was conducted to compare these fitting models in various brain regions. The fitted results indicated that: (i) the estimated CBF values using the two-parameter model show appreciable dependence on the assumed T(1,eff) values; (ii) the proposed three-parameter model achieves the optimal balance between the goodness of fit and model complexity when compared among the models with explicit IRF fitting; (iii) both the two-parameter model using fixed blood T1 values for T(1,eff) and the three-parameter model provide reasonable fitting results. Using the proposed three-parameter model, the estimated CBF (46 ± 14 mL/100 g/min) and ATT (1.4 ± 0.3 s) values averaged from different brain regions are close to the literature reports; the estimated T(1,eff) values (1.9 ± 0.4 s) are higher than the tissue T1 values, possibly reflecting a contribution from the microvascular arterial blood compartment. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Population patterns in World’s administrative units

    PubMed Central

    Miramontes, Pedro; Cocho, Germinal

    2017-01-01

    Whereas there has been an extended discussion concerning city population distribution, little has been said about that of administrative divisions. In this work, we investigate the population distribution of second-level administrative units of 150 countries and territories and propose the discrete generalized beta distribution (DGBD) rank-size function to describe the data. After testing the balance between the goodness of fit and number of parameters of this function compared with a power law, which is the most common model for city population, the DGBD is a good statistical model for 96% of our datasets and preferred over a power law in almost every case. Moreover, the DGBD is preferred over a power law for fitting country population data, which can be seen as the zeroth-level administrative unit. We present a computational toy model to simulate the formation of administrative divisions in one dimension and give numerical evidence that the DGBD arises from a particular case of this model. This model, along with the fitting of the DGBD, proves adequate in reproducing and describing local unit evolution and its effect on the population distribution. PMID:28791153

  8. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  9. Bayesian Estimation in the One-Parameter Latent Trait Model.

    DTIC Science & Technology

    1980-03-01

    Journal of Mathematical and Statistical Psychology , 1973, 26, 31-44. (a) Andersen, E. B. A goodness of fit test for the Rasch model. Psychometrika, 1973, 28...technique for estimating latent trait mental test parameters. Educational and Psychological Measurement, 1976, 36, 705-715. Lindley, D. V. The...Lord, F. M. An analysis of verbal Scholastic Aptitude Test using Birnbaum’s three-parameter logistic model. Educational and Psychological

  10. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    ERIC Educational Resources Information Center

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  11. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    NASA Astrophysics Data System (ADS)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  12. MR-MOOSE: an advanced SED-fitting tool for heterogeneous multi-wavelength data sets

    NASA Astrophysics Data System (ADS)

    Drouart, G.; Falkendal, T.

    2018-07-01

    We present the public release of MR-MOOSE, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from a heterogeneous data set (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, MR-MOOSE handles upper limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly versatile fitting tool for handling increasing source complexity when combining multi-wavelength data sets with fully customisable filter/model data bases. The complete control of the user is one advantage, which avoids the traditional problems related to the `black box' effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of PYTHON and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially generated data sets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA, and VLA data) in the context of extragalactic SED fitting makes MR-MOOSE a particularly attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  13. Dynamical modeling and multi-experiment fitting with PottersWheel

    PubMed Central

    Maiwald, Thomas; Timmer, Jens

    2008-01-01

    Motivation: Modelers in Systems Biology need a flexible framework that allows them to easily create new dynamic models, investigate their properties and fit several experimental datasets simultaneously. Multi-experiment-fitting is a powerful approach to estimate parameter values, to check the validity of a given model, and to discriminate competing model hypotheses. It requires high-performance integration of ordinary differential equations and robust optimization. Results: We here present the comprehensive modeling framework Potters-Wheel (PW) including novel functionalities to satisfy these requirements with strong emphasis on the inverse problem, i.e. data-based modeling of partially observed and noisy systems like signal transduction pathways and metabolic networks. PW is designed as a MATLAB toolbox and includes numerous user interfaces. Deterministic and stochastic optimization routines are combined by fitting in logarithmic parameter space allowing for robust parameter calibration. Model investigation includes statistical tests for model-data-compliance, model discrimination, identifiability analysis and calculation of Hessian- and Monte-Carlo-based parameter confidence limits. A rich application programming interface is available for customization within own MATLAB code. Within an extensive performance analysis, we identified and significantly improved an integrator–optimizer pair which decreases the fitting duration for a realistic benchmark model by a factor over 3000 compared to MATLAB with optimization toolbox. Availability: PottersWheel is freely available for academic usage at http://www.PottersWheel.de/. The website contains a detailed documentation and introductory videos. The program has been intensively used since 2005 on Windows, Linux and Macintosh computers and does not require special MATLAB toolboxes. Contact: maiwald@fdm.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18614583

  14. POOLMS: A computer program for fitting and model selection for two level factorial replication-free experiments

    NASA Technical Reports Server (NTRS)

    Amling, G. E.; Holms, A. G.

    1973-01-01

    A computer program is described that performs a statistical multiple-decision procedure called chain pooling. It uses a number of mean squares assigned to error variance that is conditioned on the relative magnitudes of the mean squares. The model selection is done according to user-specified levels of type 1 or type 2 error probabilities.

  15. Using a Model of Analysts' Judgments to Augment an Item Calibration Process

    ERIC Educational Resources Information Center

    Hauser, Carl; Thum, Yeow Meng; He, Wei; Ma, Lingling

    2015-01-01

    When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible…

  16. Fully Bayesian Estimation of Data from Single Case Designs

    ERIC Educational Resources Information Center

    Rindskopf, David

    2013-01-01

    Single case designs (SCDs) generally consist of a small number of short time series in two or more phases. The analysis of SCDs statistically fits in the framework of a multilevel model, or hierarchical model. The usual analysis does not take into account the uncertainty in the estimation of the random effects. This not only has an effect on the…

  17. Constraints on non-Standard Model Higgs boson interactions in an effective Lagrangian using differential cross sections measured in the H→γγ decay channel at \\(\\sqrt{s} = 8\\) TeV with the ATLAS detector

    DOE PAGES

    Aad, G.

    2015-12-02

    The strength and tensor structure of the Higgs boson's interactions are investigated using an effective Lagrangian, which introduces additional CP-even and CP-odd interactions that lead to changes in the kinematic properties of the Higgs boson and associated jet spectra with respect to the Standard Model. We found that the parameters of the effective Lagrangian are probed using a fit to five differential cross sections previously measured by the ATLAS experiment in the H→γγ decay channel with an integrated luminosity of 20.3 fb -1 at \\(\\sqrt{s} = 8\\) TeV. In order to perform a simultaneous fit to the five distributions, themore » statistical correlations between them are determined by re-analysing the H→γγ candidate events in the proton–proton collision data. No significant deviations from the Standard Model predictions are observed and limits on the effective Lagrangian parameters are derived. These statistical correlations are made publicly available to allow for future analysis of theories with non-Standard Model interactions.« less

  18. Systematic Search for Rings around Kepler Planet Candidates: Constraints on Ring Size and Occurrence Rate

    NASA Astrophysics Data System (ADS)

    Aizawa, Masataka; Masuda, Kento; Kawahara, Hajime; Suto, Yasushi

    2018-05-01

    We perform a systematic search for rings around 168 Kepler planet candidates with sufficient signal-to-noise ratios that are selected from all of the short-cadence data. We fit ringed and ringless models to their light curves and compare the fitting results to search for the signatures of planetary rings. First, we identify 29 tentative systems, for which the ringed models exhibit statistically significant improvement over the ringless models. The light curves of those systems are individually examined, but we are not able to identify any candidate that indicates evidence for rings. In turn, we find several mechanisms of false positives that would produce ringlike signals, and the null detection enables us to place upper limits on the size of the rings. Furthermore, assuming the tidal alignment between axes of the planetary rings and orbits, we conclude that the occurrence rate of rings larger than twice the planetary radius is less than 15%. Even though the majority of our targets are short-period planets, our null detection provides statistical and quantitative constraints on largely uncertain theoretical models of the origin, formation, and evolution of planetary rings.

  19. Constraints on non-Standard Model Higgs boson interactions in an effective Lagrangian using differential cross sections measured in the H→γγ decay channel at \\(\\sqrt{s} = 8\\) TeV with the ATLAS detector

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aad, G.

    The strength and tensor structure of the Higgs boson's interactions are investigated using an effective Lagrangian, which introduces additional CP-even and CP-odd interactions that lead to changes in the kinematic properties of the Higgs boson and associated jet spectra with respect to the Standard Model. We found that the parameters of the effective Lagrangian are probed using a fit to five differential cross sections previously measured by the ATLAS experiment in the H→γγ decay channel with an integrated luminosity of 20.3 fb -1 at \\(\\sqrt{s} = 8\\) TeV. In order to perform a simultaneous fit to the five distributions, themore » statistical correlations between them are determined by re-analysing the H→γγ candidate events in the proton–proton collision data. No significant deviations from the Standard Model predictions are observed and limits on the effective Lagrangian parameters are derived. These statistical correlations are made publicly available to allow for future analysis of theories with non-Standard Model interactions.« less

  20. Fitting multidimensional splines using statistical variable selection techniques

    NASA Technical Reports Server (NTRS)

    Smith, P. L.

    1982-01-01

    This report demonstrates the successful application of statistical variable selection techniques to fit splines. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs using the B-spline basis were developed, and the one for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.

  1. Sustainable Sizing.

    PubMed

    Robinette, Kathleen M; Veitch, Daisy

    2016-08-01

    To provide a review of sustainable sizing practices that reduce waste, increase sales, and simultaneously produce safer, better fitting, accommodating products. Sustainable sizing involves a set of methods good for both the environment (sustainable environment) and business (sustainable business). Sustainable sizing methods reduce (1) materials used, (2) the number of sizes or adjustments, and (3) the amount of product unsold or marked down for sale. This reduces waste and cost. The methods can also increase sales by fitting more people in the target market and produce happier, loyal customers with better fitting products. This is a mini-review of methods that result in more sustainable sizing practices. It also reviews and contrasts current statistical and modeling practices that lead to poor fit and sizing. Fit-mapping and the use of cases are two excellent methods suited for creating sustainable sizing, when real people (vs. virtual people) are used. These methods are described and reviewed. Evidence presented supports the view that virtual fitting with simulated people and products is not yet effective. Fit-mapping and cases with real people and actual products result in good design and products that are fit for person, fit for purpose, with good accommodation and comfortable, optimized sizing. While virtual models have been shown to be ineffective for predicting or representing fit, there is an opportunity to improve them by adding fit-mapping data to the models. This will require saving fit data, product data, anthropometry, and demographics in a standardized manner. For this success to extend to the wider design community, the development of a standardized method of data collection for fit-mapping with a globally shared fit-map database is needed. It will enable the world community to build knowledge of fit and accommodation and generate effective virtual fitting for the future. A standardized method of data collection that tests products' fit methodically and quantitatively will increase our predictive power to determine fit and accommodation, thereby facilitating improved, effective design. These methods apply to all products people wear, use, or occupy. © 2016, Human Factors and Ergonomics Society.

  2. Framework based on stochastic L-Systems for modeling IP traffic with multifractal behavior

    NASA Astrophysics Data System (ADS)

    Salvador, Paulo S.; Nogueira, Antonio; Valadas, Rui

    2003-08-01

    In a previous work we have introduced a multifractal traffic model based on so-called stochastic L-Systems, which were introduced by biologist A. Lindenmayer as a method to model plant growth. L-Systems are string rewriting techniques, characterized by an alphabet, an axiom (initial string) and a set of production rules. In this paper, we propose a novel traffic model, and an associated parameter fitting procedure, which describes jointly the packet arrival and the packet size processes. The packet arrival process is modeled through a L-System, where the alphabet elements are packet arrival rates. The packet size process is modeled through a set of discrete distributions (of packet sizes), one for each arrival rate. In this way the model is able to capture correlations between arrivals and sizes. We applied the model to measured traffic data: the well-known pOct Bellcore, a trace of aggregate WAN traffic and two traces of specific applications (Kazaa and Operation Flashing Point). We assess the multifractality of these traces using Linear Multiscale Diagrams. The suitability of the traffic model is evaluated by comparing the empirical and fitted probability mass and autocovariance functions; we also compare the packet loss ratio and average packet delay obtained with the measured traces and with traces generated from the fitted model. Our results show that our L-System based traffic model can achieve very good fitting performance in terms of first and second order statistics and queuing behavior.

  3. A watershed model of individual differences in fluid intelligence.

    PubMed

    Kievit, Rogier A; Davis, Simon W; Griffiths, John; Correia, Marta M; Cam-Can; Henson, Richard N

    2016-10-01

    Fluid intelligence is a crucial cognitive ability that predicts key life outcomes across the lifespan. Strong empirical links exist between fluid intelligence and processing speed on the one hand, and white matter integrity and processing speed on the other. We propose a watershed model that integrates these three explanatory levels in a principled manner in a single statistical model, with processing speed and white matter figuring as intermediate endophenotypes. We fit this model in a large (N=555) adult lifespan cohort from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) using multiple measures of processing speed, white matter health and fluid intelligence. The model fit the data well, outperforming competing models and providing evidence for a many-to-one mapping between white matter integrity, processing speed and fluid intelligence. The model can be naturally extended to integrate other cognitive domains, endophenotypes and genotypes. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Bayesian inference of physiologically meaningful parameters from body sway measurements.

    PubMed

    Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E

    2017-06-19

    The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.

  5. FIREFLY (Fitting IteRativEly For Likelihood analYsis): a full spectral fitting code

    NASA Astrophysics Data System (ADS)

    Wilkinson, David M.; Maraston, Claudia; Goddard, Daniel; Thomas, Daniel; Parikh, Taniya

    2017-12-01

    We present a new spectral fitting code, FIREFLY, for deriving the stellar population properties of stellar systems. FIREFLY is a chi-squared minimization fitting code that fits combinations of single-burst stellar population models to spectroscopic data, following an iterative best-fitting process controlled by the Bayesian information criterion. No priors are applied, rather all solutions within a statistical cut are retained with their weight. Moreover, no additive or multiplicative polynomials are employed to adjust the spectral shape. This fitting freedom is envisaged in order to map out the effect of intrinsic spectral energy distribution degeneracies, such as age, metallicity, dust reddening on galaxy properties, and to quantify the effect of varying input model components on such properties. Dust attenuation is included using a new procedure, which was tested on Integral Field Spectroscopic data in a previous paper. The fitting method is extensively tested with a comprehensive suite of mock galaxies, real galaxies from the Sloan Digital Sky Survey and Milky Way globular clusters. We also assess the robustness of the derived properties as a function of signal-to-noise ratio (S/N) and adopted wavelength range. We show that FIREFLY is able to recover age, metallicity, stellar mass, and even the star formation history remarkably well down to an S/N ∼ 5, for moderately dusty systems. Code and results are publicly available.1

  6. Fitting mechanistic epidemic models to data: A comparison of simple Markov chain Monte Carlo approaches.

    PubMed

    Li, Michael; Dushoff, Jonathan; Bolker, Benjamin M

    2018-07-01

    Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical frameworks in these contexts. Here we build a simple stochastic, discrete-time, discrete-state epidemic model with both process and observation error and use it to characterize the effectiveness of different flavours of Bayesian Markov chain Monte Carlo (MCMC) techniques. We use fits to simulated data, where parameters (and future behaviour) are known, to explore the limitations of different platforms and quantify parameter estimation accuracy, forecasting accuracy, and computational efficiency across combinations of modeling decisions (e.g. discrete vs. continuous latent states, levels of stochasticity) and computational platforms (JAGS, NIMBLE, Stan).

  7. COMPARING THE IMPAIRMENT PROFILES OF OLDER DRIVERS AND NON-DRIVERS: TOWARD THE DEVELOPMENT OF A FITNESS-TO-DRIVE MODEL

    PubMed Central

    Antin, Jonathan F.; Stanley, Laura M.; Guo, Feng

    2011-01-01

    The purpose of this research effort was to compare older driver and non-driver functional impairment profiles across some 60 assessment metrics in an initial effort to contribute to the development of fitness-to-drive assessment models. Of the metrics evaluated, 21 showed statistically significant differences, almost all favoring the drivers. Also, it was shown that a logistic regression model comprised of five of the assessment scores could completely and accurately separate the two groups. The results of this study imply that older drivers are far less functionally impaired than non-drivers of similar ages, and that a parsimonious model can accurately assign individuals to either group. With such models, any driver classified or diagnosed as a non-driver would be a strong candidate for further investigation and intervention. PMID:22058607

  8. A Parametric Model of Shoulder Articulation for Virtual Assessment of Space Suit Fit

    NASA Technical Reports Server (NTRS)

    Kim, K. Han; Young, Karen S.; Bernal, Yaritza; Boppana, Abhishektha; Vu, Linh Q.; Benson, Elizabeth A.; Jarvis, Sarah; Rajulu, Sudhakar L.

    2016-01-01

    Shoulder injury is one of the most severe risks that have the potential to impair crewmembers' performance and health in long duration space flight. Overall, 64% of crewmembers experience shoulder pain after extra-vehicular training in a space suit, and 14% of symptomatic crewmembers require surgical repair (Williams & Johnson, 2003). Suboptimal suit fit, in particular at the shoulder region, has been identified as one of the predominant risk factors. However, traditional suit fit assessments and laser scans represent only a single person's data, and thus may not be generalized across wide variations of body shapes and poses. The aim of this work is to develop a software tool based on a statistical analysis of a large dataset of crewmember body shapes. This tool can accurately predict the skin deformation and shape variations for any body size and shoulder pose for a target population, from which the geometry can be exported and evaluated against suit models in commercial CAD software. A preliminary software tool was developed by statistically analyzing 150 body shapes matched with body dimension ranges specified in the Human-Systems Integration Requirements of NASA ("baseline model"). Further, the baseline model was incorporated with shoulder joint articulation ("articulation model"), using additional subjects scanned in a variety of shoulder poses across a pre-specified range of motion. Scan data was cleaned and aligned using body landmarks. The skin deformation patterns were dimensionally reduced and the co-variation with shoulder angles was analyzed. A software tool is currently in development and will be presented in the final proceeding. This tool would allow suit engineers to parametrically generate body shapes in strategically targeted anthropometry dimensions and shoulder poses. This would also enable virtual fit assessments, with which the contact volume and clearance between the suit and body surface can be predictively quantified at reduced time and cost.

  9. Development and evaluation of social cognitive measures related to adolescent physical activity.

    PubMed

    Dewar, Deborah L; Lubans, David Revalds; Morgan, Philip James; Plotnikoff, Ronald C

    2013-05-01

    This study aimed to develop and evaluate the construct validity and reliability of modernized social cognitive measures relating to physical activity behaviors in adolescents. An instrument was developed based on constructs from Bandura's Social Cognitive Theory and included the following scales: self-efficacy, situation (perceived physical environment), social support, behavioral strategies, and outcome expectations and expectancies. The questionnaire was administered in a sample of 171 adolescents (age = 13.6 ± 1.2 years, females = 61%). Confirmatory factor analysis was employed to examine model-fit for each scale using multiple indices, including chi-square index, comparative-fit index (CFI), goodness-of-fit index (GFI), and the root mean square error of approximation (RMSEA). Reliability properties were also examined (ICC and Cronbach's alpha). Each scale represented a statistically sound measure: fit indices indicated each model to be an adequate-to-exact fit to the data; internal consistency was acceptable to good (α = 0.63-0.79); rank order repeatability was strong (ICC = 0.82-0.91). Results support the validity and reliability of social cognitive scales relating to physical activity among adolescents. As such, the developed scales have utility for the identification of potential social cognitive correlates of youth physical activity, mediators of physical activity behavior changes and the testing of theoretical models based on Social Cognitive Theory.

  10. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  11. "Simulated molecular evolution" or computer-generated artifacts?

    PubMed

    Darius, F; Rojas, R

    1994-11-01

    1. The authors define a function with value 1 for the positive examples and 0 for the negative ones. They fit a continuous function but do not deal at all with the error margin of the fit, which is almost as large as the function values they compute. 2. The term "quality" for the value of the fitted function gives the impression that some biological significance is associated with values of the fitted function strictly between 0 and 1, but there is no justification for this kind of interpretation and finding the point where the fit achieves its maximum does not make sense. 3. By neglecting the error margin the authors try to optimize the fitted function using differences in the second, third, fourth, and even fifth decimal place which have no statistical significance. 4. Even if such a fit could profit from more data points, the authors should first prove that the region of interest has some kind of smoothness, that is, that a continuous fit makes any sense at all. 5. "Simulated molecular evolution" is a misnomer. We are dealing here with random search. Since the margin of error is so large, the fitted function does not provide statistically significant information about the points in search space where strings with cleavage sites could be found. This implies that the method is a highly unreliable stochastic search in the space of strings, even if the neural network is capable of learning some simple correlations. 6. Classical statistical methods are for these kind of problems with so few data points clearly superior to the neural networks used as a "black box" by the authors, which in the way they are structured provide a model with an error margin as large as the numbers being computed.7. And finally, even if someone would provide us with a function which separates strings with cleavage sites from strings without them perfectly, so-called simulated molecular evolution would not be better than random selection.Since a perfect fit would only produce exactly ones or zeros,starting a search in a region of space where all strings in the neighborhood get the value zero would not provide any kind of directional information for new iterations. We would just skip from one point to the other in a typical random walk manner.

  12. A comparison of modelling techniques used to characterise oxygen uptake kinetics during the on-transient of exercise.

    PubMed

    Bell, C; Paterson, D H; Kowalchuk, J M; Padilla, J; Cunningham, D A

    2001-09-01

    We compared estimates for the phase 2 time constant (tau) of oxygen uptake (VO2) during moderate- and heavy-intensity exercise, and the slow component of VO2 during heavy-intensity exercise using previously published exponential models. Estimates for tau and the slow component were different (P < 0.05) among models. For moderate-intensity exercise, a two-component exponential model, or a mono-exponential model fitted from 20 s to 3 min were best. For heavy-intensity exercise, a three-component model fitted throughout the entire 6 min bout of exercise, or a two-component model fitted from 20 s were best. When the time delays for the two- and three-component models were equal the best statistical fit was obtained; however, this model produced an inappropriately low DeltaVO2/DeltaWR (WR, work rate) for the projected phase 2 steady state, and the estimate of phase 2 tau was shortened compared with other models. The slow component was quantified as the difference between VO2 at end-exercise (6 min) and at 3 min (DeltaVO2 (6-3 min)); 259 ml x min(-1)), and also using the phase 3 amplitude terms (truncated to end-exercise) from exponential fits (409-833 ml x min(-1)). Onset of the slow component was identified by the phase 3 time delay parameter as being of delayed onset approximately 2 min (vs. arbitrary 3 min). Using this delay DeltaVO2 (6-2 min) was approximately 400 ml x min(-1). Use of valid consistent methods to estimate tau and the slow component in exercise are needed to advance physiological understanding.

  13. The Chandra Source Catalog 2.0: Spectral Properties

    NASA Astrophysics Data System (ADS)

    McCollough, Michael L.; Siemiginowska, Aneta; Burke, Douglas; Nowak, Michael A.; Primini, Francis Anthony; Laurino, Omar; Nguyen, Dan T.; Allen, Christopher E.; Anderson, Craig S.; Budynkiewicz, Jamie A.; Chen, Judy C.; Civano, Francesca Maria; D'Abrusco, Raffaele; Doe, Stephen M.; Evans, Ian N.; Evans, Janet D.; Fabbiano, Giuseppina; Gibbs, Danny G., II; Glotfelty, Kenny J.; Graessle, Dale E.; Grier, John D.; Hain, Roger; Hall, Diane M.; Harbo, Peter N.; Houck, John C.; Lauer, Jennifer L.; Lee, Nicholas P.; Martínez-Galarza, Juan Rafael; McDowell, Jonathan C.; Miller, Joseph; McLaughlin, Warren; Morgan, Douglas L.; Mossman, Amy E.; Nichols, Joy S.; Paxson, Charles; Plummer, David A.; Rots, Arnold H.; Sundheim, Beth A.; Tibbetts, Michael; Van Stone, David W.; Zografou, Panagoula; Chandra Source Catalog Team

    2018-01-01

    The second release of the Chandra Source Catalog (CSC) contains all sources identified from sixteen years' worth of publicly accessible observations. The vast majority of these sources have been observed with the ACIS detector and have spectral information in 0.5-7 keV energy range. Here we describe the methods used to automatically derive spectral properties for each source detected by the standard processing pipeline and included in the final CSC. The sources with high signal to noise ratio (exceeding 150 net counts) were fit in Sherpa (the modeling and fitting application from the Chandra Interactive Analysis of Observations package) using wstat as a fit statistic and Bayesian draws method to determine errors. Three models were fit to each source: an absorbed power-law, blackbody, and Bremsstrahlung emission. The fitted parameter values for the power-law, blackbody, and Bremsstrahlung models were included in the catalog with the calculated flux for each model. The CSC also provides the source energy fluxes computed from the normalizations of predefined absorbed power-law, black-body, Bremsstrahlung, and APEC models needed to match the observed net X-ray counts. For sources that have been observed multiple times we performed a Bayesian Blocks analysis will have been performed (see the Primini et al. poster) and the most significant block will have a joint fit performed for the mentioned spectral models. In addition, we provide access to data products for each source: a file with source spectrum, the background spectrum, and the spectral response of the detector. Hardness ratios were calculated for each source between pairs of energy bands (soft, medium and hard). This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center.

  14. Syndromes of Self-Reported Psychopathology for Ages 18-59 in 29 Societies.

    PubMed

    Ivanova, Masha Y; Achenbach, Thomas M; Rescorla, Leslie A; Tumer, Lori V; Ahmeti-Pronaj, Adelina; Au, Alma; Maese, Carmen Avila; Bellina, Monica; Caldas, J Carlos; Chen, Yi-Chuen; Csemy, Ladislav; da Rocha, Marina M; Decoster, Jeroen; Dobrean, Anca; Ezpeleta, Lourdes; Fontaine, Johnny R J; Funabiki, Yasuko; Guðmundsson, Halldór S; Harder, Valerie S; de la Cabada, Marie Leiner; Leung, Patrick; Liu, Jianghong; Mahr, Safia; Malykh, Sergey; Maras, Jelena Srdanovic; Markovic, Jasminka; Ndetei, David M; Oh, Kyung Ja; Petot, Jean-Michel; Riad, Geylan; Sakarya, Direnc; Samaniego, Virginia C; Sebre, Sandra; Shahini, Mimoza; Silvares, Edwiges; Simulioniene, Roma; Sokoli, Elvisa; Talcott, Joel B; Vazquez, Natalia; Zasepa, Ewa

    2015-06-01

    This study tested the multi-society generalizability of an eight-syndrome assessment model derived from factor analyses of American adults' self-ratings of 120 behavioral, emotional, and social problems. The Adult Self-Report (ASR; Achenbach and Rescorla 2003) was completed by 17,152 18-59-year-olds in 29 societies. Confirmatory factor analyses tested the fit of self-ratings in each sample to the eight-syndrome model. The primary model fit index (Root Mean Square Error of Approximation) showed good model fit for all samples, while secondary indices showed acceptable to good fit. Only 5 (0.06%) of the 8,598 estimated parameters were outside the admissible parameter space. Confidence intervals indicated that sampling fluctuations could account for the deviant parameters. Results thus supported the tested model in societies differing widely in social, political, and economic systems, languages, ethnicities, religions, and geographical regions. Although other items, societies, and analytic methods might yield different results, the findings indicate that adults in very diverse societies were willing and able to rate themselves on the same standardized set of 120 problem items. Moreover, their self-ratings fit an eight-syndrome model previously derived from self-ratings by American adults. The support for the statistically derived syndrome model is consistent with previous findings for parent, teacher, and self-ratings of 1½-18-year-olds in many societies. The ASR and its parallel collateral-report instrument, the Adult Behavior Checklist (ABCL), may offer mental health professionals practical tools for the multi-informant assessment of clinical constructs of adult psychopathology that appear to be meaningful across diverse societies.

  15. Development of a statistical model for cervical cancer cell death with irreversible electroporation in vitro.

    PubMed

    Yang, Yongji; Moser, Michael A J; Zhang, Edwin; Zhang, Wenjun; Zhang, Bing

    2018-01-01

    The aim of this study was to develop a statistical model for cell death by irreversible electroporation (IRE) and to show that the statistic model is more accurate than the electric field threshold model in the literature using cervical cancer cells in vitro. HeLa cell line was cultured and treated with different IRE protocols in order to obtain data for modeling the statistical relationship between the cell death and pulse-setting parameters. In total, 340 in vitro experiments were performed with a commercial IRE pulse system, including a pulse generator and an electric cuvette. Trypan blue staining technique was used to evaluate cell death after 4 hours of incubation following IRE treatment. Peleg-Fermi model was used in the study to build the statistical relationship using the cell viability data obtained from the in vitro experiments. A finite element model of IRE for the electric field distribution was also built. Comparison of ablation zones between the statistical model and electric threshold model (drawn from the finite element model) was used to show the accuracy of the proposed statistical model in the description of the ablation zone and its applicability in different pulse-setting parameters. The statistical models describing the relationships between HeLa cell death and pulse length and the number of pulses, respectively, were built. The values of the curve fitting parameters were obtained using the Peleg-Fermi model for the treatment of cervical cancer with IRE. The difference in the ablation zone between the statistical model and the electric threshold model was also illustrated to show the accuracy of the proposed statistical model in the representation of ablation zone in IRE. This study concluded that: (1) the proposed statistical model accurately described the ablation zone of IRE with cervical cancer cells, and was more accurate compared with the electric field model; (2) the proposed statistical model was able to estimate the value of electric field threshold for the computer simulation of IRE in the treatment of cervical cancer; and (3) the proposed statistical model was able to express the change in ablation zone with the change in pulse-setting parameters.

  16. Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model

    PubMed Central

    2011-01-01

    Background China is a country that is most seriously affected by hemorrhagic fever with renal syndrome (HFRS) with 90% of HFRS cases reported globally. At present, HFRS is getting worse with increasing cases and natural foci in China. Therefore, there is an urgent need for monitoring and predicting HFRS incidence to make the control of HFRS more effective. In this study, we applied a stochastic autoregressive integrated moving average (ARIMA) model with the objective of monitoring and short-term forecasting HFRS incidence in China. Methods Chinese HFRS data from 1975 to 2008 were used to fit ARIMA model. Akaike Information Criterion (AIC) and Ljung-Box test were used to evaluate the constructed models. Subsequently, the fitted ARIMA model was applied to obtain the fitted HFRS incidence from 1978 to 2008 and contrast with corresponding observed values. To assess the validity of the proposed model, the mean absolute percentage error (MAPE) between the observed and fitted HFRS incidence (1978-2008) was calculated. Finally, the fitted ARIMA model was used to forecast the incidence of HFRS of the years 2009 to 2011. All analyses were performed using SAS9.1 with a significant level of p < 0.05. Results The goodness-of-fit test of the optimum ARIMA (0,3,1) model showed non-significant autocorrelations in the residuals of the model (Ljung-Box Q statistic = 5.95,P = 0.3113). The fitted values made by ARIMA (0,3,1) model for years 1978-2008 closely followed the observed values for the same years, with a mean absolute percentage error (MAPE) of 12.20%. The forecast values from 2009 to 2011 were 0.69, 0.86, and 1.21per 100,000 population, respectively. Conclusion ARIMA models applied to historical HFRS incidence data are an important tool for HFRS surveillance in China. This study shows that accurate forecasting of the HFRS incidence is possible using an ARIMA model. If predicted values from this study are accurate, China can expect a rise in HFRS incidence. PMID:21838933

  17. Applying a Hypoxia-Incorporating TCP Model to Experimental Data on Rat Sarcoma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruggieri, Ruggero, E-mail: ruggieri.ruggero@gmail.com; Stavreva, Nadejda; Naccarato, Stefania

    2012-08-01

    Purpose: To verify whether a tumor control probability (TCP) model which mechanistically incorporates acute and chronic hypoxia is able to describe animal in vivo dose-response data, exhibiting tumor reoxygenation. Methods and Materials: The investigated TCP model accounts for tumor repopulation, reoxygenation of chronic hypoxia, and fluctuating oxygenation of acute hypoxia. Using the maximum likelihood method, the model is fitted to Fischer-Moulder data on Wag/Rij rats, inoculated with rat rhabdomyosarcoma BA1112, and irradiated in vivo using different fractionation schemes. This data set is chosen because two of the experimental dose-response curves exhibit an inverse dose behavior, which is interpreted as duemore » to reoxygenation. The tested TCP model is complex, and therefore, in vivo cell survival data on the same BA1112 cell line from Reinhold were added to Fischer-Moulder data and fitted simultaneously with a corresponding cell survival function. Results: The obtained fit to the combined Fischer-Moulder-Reinhold data was statistically acceptable. The best-fit values of the model parameters for which information exists were in the range of published values. The cell survival curves of well-oxygenated and hypoxic cells, computed using the best-fit values of the radiosensitivities and the initial number of clonogens, were in good agreement with the corresponding in vitro and in situ experiments of Reinhold. The best-fit values of most of the hypoxia-related parameters were used to recompute the TCP for non-small cell lung cancer patients as a function of the number of fractions, TCP(n). Conclusions: The investigated TCP model adequately describes animal in vivo data exhibiting tumor reoxygenation. The TCP(n) curve computed for non-small cell lung cancer patients with the best-fit values of most of the hypoxia-related parameters confirms previously obtained abrupt reduction in TCP for n < 10, thus warning against the adoption of severely hypofractionated schedules.« less

  18. Bayesian inference in an item response theory model with a generalized student t link function

    NASA Astrophysics Data System (ADS)

    Azevedo, Caio L. N.; Migon, Helio S.

    2012-10-01

    In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.

  19. Structure-Specific Statistical Mapping of White Matter Tracts

    PubMed Central

    Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

    2008-01-01

    We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524

  20. How to infer relative fitness from a sample of genomic sequences.

    PubMed

    Dayarian, Adel; Shraiman, Boris I

    2014-07-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.

  1. Powerlaw: a Python package for analysis of heavy-tailed distributions.

    PubMed

    Alstott, Jeff; Bullmore, Ed; Plenz, Dietmar

    2014-01-01

    Power laws are theoretically interesting probability distributions that are also frequently used to describe empirical data. In recent years, effective statistical methods for fitting power laws have been developed, but appropriate use of these techniques requires significant programming and statistical insight. In order to greatly decrease the barriers to using good statistical methods for fitting power law distributions, we developed the powerlaw Python package. This software package provides easy commands for basic fitting and statistical analysis of distributions. Notably, it also seeks to support a variety of user needs by being exhaustive in the options available to the user. The source code is publicly available and easily extensible.

  2. How Well Can We Detect Lineage-Specific Diversification-Rate Shifts? A Simulation Study of Sequential AIC Methods.

    PubMed

    May, Michael R; Moore, Brian R

    2016-11-01

    Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified [Formula: see text] of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers-in order to clarify whether these methods can make reliable inferences from empirical datasets-and to theoretical biologists-in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  3. How Well Can We Detect Lineage-Specific Diversification-Rate Shifts? A Simulation Study of Sequential AIC Methods

    PubMed Central

    May, Michael R.; Moore, Brian R.

    2016-01-01

    Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified ≈30% of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers—in order to clarify whether these methods can make reliable inferences from empirical datasets—and to theoretical biologists—in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.] PMID:27037081

  4. Factor structure of a conceptual model of oral health tested among 65-year olds in Norway and Sweden.

    PubMed

    Astrøm, Anne Nordrehaug; Ekbäck, Gunnar; Ordell, Sven

    2010-04-01

    No studies have tested oral health-related quality of life models in dentate older adults across different populations. To test the factor structure of oral health outcomes within Gilbert's conceptual model among 65-year olds in Sweden and Norway. It was hypothesized that responses to 14 observed indicators could be explained by three correlated factors, symptom status, functional limitations and oral disadvantages, that each observed oral health indicator would associate more strongly with the factor it is supposed to measure than with competing factors and that the proposed 3-factor structure would possess satisfactory cross-national stability with 65-year olds in Norway and Sweden. In 2007, 6078 Swedish- and 4062 Norwegian adults borne in 1942 completed mailed questionnaires including oral symptoms, functional limitations and the eight item Oral Impacts on Daily Performances inventory. Model generation analysis was restricted to the Norwegian study group and the model achieved was tested without modifications in Swedish 65-year olds. A modified 3-factor solution with cross-loadings, improved the fit to the data compared with a 2-factor- and the initially proposed 3-factor model among the Norwegian [comparative fit index (CFI) = 0.97] and Swedish (CFI = 0.98) participants. All factor loadings for the modified 3-factor model were in the expected direction and were statistically significant at CR > 1. Multiple group confirmatory factor analyses, with Norwegian and Swedish data simultaneously revealed acceptable fit for the unconstrained model (CFI = 0.97), whereas unconstrained and constrained models were statistically significant different in nested model comparison. Within construct validity of Gilbert's model was supported with Norwegian and Swedish 65-year olds, indicating that the 14-item questionnaire reflected three constructs; symptom status, functional limitation and oral disadvantage. Measurement invariance was confirmed at the level of factor structure, suggesting that the 3-factor model is comparable to some extent across 65-year olds in Norway and Sweden.

  5. Bayesian analysis of CCDM models

    NASA Astrophysics Data System (ADS)

    Jesus, J. F.; Valentim, R.; Andrade-Oliveira, F.

    2017-09-01

    Creation of Cold Dark Matter (CCDM), in the context of Einstein Field Equations, produces a negative pressure term which can be used to explain the accelerated expansion of the Universe. In this work we tested six different spatially flat models for matter creation using statistical criteria, in light of SNe Ia data: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Bayesian Evidence (BE). These criteria allow to compare models considering goodness of fit and number of free parameters, penalizing excess of complexity. We find that JO model is slightly favoured over LJO/ΛCDM model, however, neither of these, nor Γ = 3αH0 model can be discarded from the current analysis. Three other scenarios are discarded either because poor fitting or because of the excess of free parameters. A method of increasing Bayesian evidence through reparameterization in order to reducing parameter degeneracy is also developed.

  6. Independent and joint effects of sedentary time and cardiorespiratory fitness on all-cause mortality: the Cooper Center Longitudinal Study.

    PubMed

    Shuval, Kerem; Finley, Carrie E; Barlow, Carolyn E; Nguyen, Binh T; Njike, Valentine Y; Pettee Gabriel, Kelley

    2015-11-01

    To examine the independent and joint effects of sedentary time and cardiorespiratory fitness (fitness) on all-cause mortality. A prospective study of 3141 Cooper Center Longitudinal Study participants. Participants provided information on television (TV) viewing and car time in 1982 and completed a maximal exercise test during a 1-year time frame; they were then followed until mortality or through 2010. TV viewing, car time, total sedentary time and fitness were the primary exposures and all-cause mortality was the outcome. The relationship between the exposures and outcome was examined utilising Cox proportional hazard models. A total of 581 deaths occurred over a median follow-up period of 28.7 years (SD=4.4). At baseline, participants' mean age was 45.0 years (SD=9.6), 86.5% were men and their mean body mass index was 24.6 (SD=3.0). Multivariable analyses revealed a significant linear relationship between increased fitness and lower mortality risk, even while adjusting for total sedentary time and covariates (p=0.02). The effects of total sedentary time on increased mortality risk did not quite reach statistical significance once fitness and covariates were adjusted for (p=0.05). When examining this relationship categorically, in comparison to the reference category (≤10 h/week), being sedentary for ≥23 h weekly increased mortality risk by 29% without controlling for fitness (HR=1.29, 95% CI 1.03 to 1.63); however, once fitness and covariates were taken into account this relationship did not reach statistical significance (HR=1.20, 95% CI 0.95 to 1.51). Moreover, spending >10 h in the car weekly significantly increased mortality risk by 27% in the fully adjusted model. The association between TV viewing and mortality was not significant. The relationship between total sedentary time and higher mortality risk is less pronounced when fitness is taken into account. Increased car time, but not TV viewing, is significantly related to higher mortality risk, even when taking fitness into account, in this cohort. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  7. Effects of vegetation canopy on the radar backscattering coefficient

    NASA Technical Reports Server (NTRS)

    Mo, T.; Blanchard, B. J.; Schmugge, T. J.

    1983-01-01

    Airborne L- and C-band scatterometer data, taken over both vegetation-covered and bare fields, were systematically analyzed and theoretically reproduced, using a recently developed model for calculating radar backscattering coefficients of rough soil surfaces. The results show that the model can reproduce the observed angular variations of radar backscattering coefficient quite well via a least-squares fit method. Best fits to the data provide estimates of the statistical properties of the surface roughness, which is characterized by two parameters: the standard deviation of surface height, and the surface correlation length. In addition, the processes of vegetation attenuation and volume scattering require two canopy parameters, the canopy optical thickness and a volume scattering factor. Canopy parameter values for individual vegetation types, including alfalfa, milo and corn, were also determined from the best-fit results. The uncertainties in the scatterometer data were also explored.

  8. Quantification of brain tissue through incorporation of partial volume effects

    NASA Astrophysics Data System (ADS)

    Gage, Howard D.; Santago, Peter, II; Snyder, Wesley E.

    1992-06-01

    This research addresses the problem of automatically quantifying the various types of brain tissue, CSF, white matter, and gray matter, using T1-weighted magnetic resonance images. The method employs a statistical model of the noise and partial volume effect and fits the derived probability density function to that of the data. Following this fit, the optimal decision points can be found for the materials and thus they can be quantified. Emphasis is placed on repeatable results for which a confidence in the solution might be measured. Results are presented assuming a single Gaussian noise source and a uniform distribution of partial volume pixels for both simulated and actual data. Thus far results have been mixed, with no clear advantage being shown in taking into account partial volume effects. Due to the fitting problem being ill-conditioned, it is not yet clear whether these results are due to problems with the model or the method of solution.

  9. THERMUS—A thermal model package for ROOT

    NASA Astrophysics Data System (ADS)

    Wheaton, S.; Cleymans, J.; Hauer, M.

    2009-01-01

    THERMUS is a package of C++ classes and functions allowing statistical-thermal model analyses of particle production in relativistic heavy-ion collisions to be performed within the ROOT framework of analysis. Calculations are possible within three statistical ensembles; a grand-canonical treatment of the conserved charges B, S and Q, a fully canonical treatment of the conserved charges, and a mixed-canonical ensemble combining a canonical treatment of strangeness with a grand-canonical treatment of baryon number and electric charge. THERMUS allows for the assignment of decay chains and detector efficiencies specific to each particle yield, which enables sensible fitting of model parameters to experimental data. Program summaryProgram title: THERMUS, version 2.1 Catalogue identifier: AEBW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 17 152 No. of bytes in distributed program, including test data, etc.: 93 581 Distribution format: tar.gz Programming language: C++ Computer: PC, Pentium 4, 1 GB RAM (not hardware dependent) Operating system: Linux: FEDORA, RedHat, etc. Classification: 17.7 External routines: Numerical Recipes in C [1], ROOT [2] Nature of problem: Statistical-thermal model analyses of heavy-ion collision data require the calculation of both primordial particle densities and contributions from resonance decay. A set of thermal parameters (the number depending on the particular model imposed) and a set of thermalized particles, with their decays specified, is required as input to these models. The output is then a complete set of primordial thermal quantities for each particle, together with the contributions to the final particle yields from resonance decay. In many applications of statistical-thermal models it is required to fit experimental particle multiplicities or particle ratios. In such analyses, the input is a set of experimental yields and ratios, a set of particles comprising the assumed hadron resonance gas formed in the collision and the constraints to be placed on the system. The thermal model parameters consistent with the specified constraints leading to the best-fit to the experimental data are then output. Solution method: THERMUS is a package designed for incorporation into the ROOT [2] framework, used extensively by the heavy-ion community. As such, it utilizes a great deal of ROOT's functionality in its operation. ROOT features used in THERMUS include its containers, the wrapper TMinuit implementing the MINUIT fitting package, and the TMath class of mathematical functions and routines. Arguably the most useful feature is the utilization of CINT as the control language, which allows interactive access to the THERMUS objects. Three distinct statistical ensembles are included in THERMUS, while additional options to include quantum statistics, resonance width and excluded volume corrections are also available. THERMUS provides a default particle list including all mesons (up to the K4∗ (2045)) and baryons (up to the Ω) listed in the July 2002 Particle Physics Booklet [3]. For each typically unstable particle in this list, THERMUS includes a text-file listing its decays. With thermal parameters specified, THERMUS calculates primordial thermal densities either by performing numerical integrations or else, in the case of the Boltzmann approximation without resonance width in the grand-canonical ensemble, by evaluating Bessel functions. Particle decay chains are then used to evaluate experimental observables (i.e. particle yields following resonance decay). Additional detector efficiency factors allow fine-tuning of the model predictions to a specific detector arrangement. When parameters are required to be constrained, use is made of the 'Numerical Recipes in C' [1] function which applies the Broyden globally convergent secant method of solving nonlinear systems of equations. Since the NRC software is not freely-available, it has to be purchased by the user. THERMUS provides the means of imposing a large number of constraints on the chosen model (amongst others, THERMUS can fix the baryon-to-charge ratio of the system, the strangeness density of the system and the primordial energy per hadron). Fits to experimental data are accomplished in THERMUS by using the ROOT TMinuit class. In its default operation, the standard χ function is minimized, yielding the set of best-fit thermal parameters. THERMUS allows the assignment of separate decay chains to each experimental input. In this way, the model is able to match the specific feed-down corrections of a particular data set. Running time: Depending on the analysis required, run-times vary from seconds (for the evaluation of particle multiplicities given a set of parameters) to several minutes (for fits to experimental data subject to constraints). References:W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, 2002. R. Brun, F. Rademakers, Nucl. Inst. Meth. Phys. Res. A 389 (1997) 81. See also http://root.cern.ch/. K. Hagiwara et al., Phys. Rev. D 66 (2002) 010001.

  10. [Acute responses on lipid profile by practicing cycling].

    PubMed

    Díaz-Ríos, Lillian Karina; Rivera-Cisneros, Antonio Eugenio; Macías-Cervantes, Maciste Habacuc; Sánchez-González, Jorge Manuel; Guerrero-Martínez, Francisco Javier

    2008-01-01

    it has been demonstrated an association between the increase in physical activity and improvements in the lipid profile. to evaluate changes in the serum lipids caused by spinning practice. nine men and twelve women were studied, they underwent to an initial evaluation that included a treadmill effort test, in order to establish the physical fitness level. With the purpose of determine the lipids change, a blood sample was obtained before and after a typical spinning session. The design was prospective, experimental, longitudinal and comparative study. Student's t-test and regression model were used to determine the changes in the lipids concentrations, and its relation with the physical fitness level. A p value < or = 0.05 was required for statistical significance. lipids increase concentrations were observed (p < 0.05), except at triglycerides in men, in which it had a decrease. It was statistically significant relation between the physical fitness level and the percentage of high-density lipoproteins variation (r = 0.44, p = 0.046). the percentage of high-density lipoproteins variation was greater when the values of VO(2)max were higher. At greater level of medical fitness greater positive answer in this lipoproteins. In the case of the rest serum lipids, it was not observed relation between the level of medical fitness and the percentage of variation due to the execution of the spinning session.

  11. Influence of environmental statistics on inhibition of saccadic return

    PubMed Central

    Farrell, Simon; Ludwig, Casimir J. H.; Ellis, Lucy A.; Gilchrist, Iain D.

    2009-01-01

    Initiating an eye movement is slowed if the saccade is directed to a location that has been fixated in the recent past. We show that this inhibitory effect is modulated by the temporal statistics of the environment: If a return location is likely to become behaviorally relevant, inhibition of return is absent. By fitting an accumulator model of saccadic decision-making, we show that the inhibitory effect and the sensitivity to local statistics can be dissociated in their effects on the rate of accumulation of evidence, and the threshold controlling the amount of evidence needed to generate a saccade. PMID:20080778

  12. Psychometric properties and Confirmatory structure of the Strengths and difficulties questionnaire in a sample of adolescents in Nigeria.

    PubMed

    Akpa, Onoja M; Afolabi, Rotimi F; Fowobaje, Kayode R

    Though the SDQ has been used in selected studies in Nigeria, its theoretical structure has not been fully and appropriately investigated in the setting. The present study employs Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) to investigate the theoretical structure of the self-reported version of the SDQ in a sample of adolescents in Benue state, Nigeria. A total of 1,244 adolescents from different categories of secondary schools in Makurdi and Vandekya Local government areas of Benue state participated in the study. Preliminary data analyses were performed using descriptive statistics while the theoretical structure of the SDQ was assessed using EFA and CFA. Model fits were assessed using Chi-square test and other fit indices at 5% significance level. Participants were 14.19±2.45 (Vandekya) and 14.19±2.45 (Makurdi) years old. Results of the EFA and CFA revealed a 3-factor oblique model as the best model for the sample of adolescents studied ( χ 2 / df =2.20, p<0.001) with all fit indices yielding better results. A correlated 3-factor model fits the present data better than the 5-factor theoretical model of the SDQ. The use of the original 5-factor model of the SDQ in the present setting should be interpreted with caution.

  13. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

    PubMed

    Dorazio, Robert M; Hunter, Margaret E

    2015-11-03

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  15. Patient choice modelling: how do patients choose their hospitals?

    PubMed

    Smith, Honora; Currie, Christine; Chaiwuttisak, Pornpimol; Kyprianou, Andreas

    2018-06-01

    As an aid to predicting future hospital admissions, we compare use of the Multinomial Logit and the Utility Maximising Nested Logit models to describe how patients choose their hospitals. The models are fitted to real data from Derbyshire, United Kingdom, which lists the postcodes of more than 200,000 admissions to six different local hospitals. Both elective and emergency admissions are analysed for this mixed urban/rural area. For characteristics that may affect a patient's choice of hospital, we consider the distance of the patient from the hospital, the number of beds at the hospital and the number of car parking spaces available at the hospital, as well as several statistics publicly available on National Health Service (NHS) websites: an average waiting time, the patient survey score for ward cleanliness, the patient safety score and the inpatient survey score for overall care. The Multinomial Logit model is successfully fitted to the data. Results obtained with the Utility Maximising Nested Logit model show that nesting according to city or town may be invalid for these data; in other words, the choice of hospital does not appear to be preceded by choice of city. In all of the analysis carried out, distance appears to be one of the main influences on a patient's choice of hospital rather than statistics available on the Internet.

  16. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  17. More Sophisticated Fits of the Oribts of Haumea's Interacting Moons

    NASA Astrophysics Data System (ADS)

    Oldroyd, William Jared; Ragozzine, Darin; Porter, Simon

    2018-04-01

    Since the discovery of Haumea's moons, it has been a challenge to model the orbits of its moons, Hi’iaka and Namaka. With many precision HST observations, Ragozzine & Brown 2009 succeeded in calculating a three-point mass model which was essential because Keplerian orbits were not a statistically acceptable fit. New data obtained in 2010 could be fit by adding a J2 and spin pole to Haumea, but new data from 2015 was far from the predicted locations, even after an extensive exploration using Bayesian Markov Chain Monte Carlo methods (using emcee). Here we report on continued investigations as to why our model cannot fit the full 10-year baseline of data. We note that by ignoring Haumea and instead examining the relative motion of the two moons in the Hi’iaka centered frame leads to adequate fits for the data. This suggests there are additional parameters connected to Haumea that will be required in a full model. These parameters are potentially related to photocenter-barycenter shifts which could be significant enough to affect the fitting process; these are unlikely to be caused by the newly discovered ring (Ortiz et al. 2017) or by unknown satellites (Burkhart et al. 2016). Additionally, we have developed a new SPIN+N-bodY integrator called SPINNY that self-consistently calculates the interactions between n-quadrupoles and is designed to test the importance of other possible effects (Haumea C22, satellite torques on the spin-pole, Sun, etc.) on our astrometric fits. By correctly determining the orbit of Haumea’s satellites we develop a better understanding of the physical properties of each of the objects with implications for the formation of Haumea, its moons, and its collisional family.

  18. A phase transition induces chaos in a predator-prey ecosystem with a dynamic fitness landscape.

    PubMed

    Gilpin, William; Feldman, Marcus W

    2017-07-01

    In many ecosystems, natural selection can occur quickly enough to influence the population dynamics and thus future selection. This suggests the importance of extending classical population dynamics models to include such eco-evolutionary processes. Here, we describe a predator-prey model in which the prey population growth depends on a prey density-dependent fitness landscape. We show that this two-species ecosystem is capable of exhibiting chaos even in the absence of external environmental variation or noise, and that the onset of chaotic dynamics is the result of the fitness landscape reversibly alternating between epochs of stabilizing and disruptive selection. We draw an analogy between the fitness function and the free energy in statistical mechanics, allowing us to use the physical theory of first-order phase transitions to understand the onset of rapid cycling in the chaotic predator-prey dynamics. We use quantitative techniques to study the relevance of our model to observational studies of complex ecosystems, finding that the evolution-driven chaotic dynamics confer community stability at the "edge of chaos" while creating a wide distribution of opportunities for speciation during epochs of disruptive selection-a potential observable signature of chaotic eco-evolutionary dynamics in experimental studies.

  19. Integration of Advanced Statistical Analysis Tools and Geophysical Modeling

    DTIC Science & Technology

    2010-12-01

    Carin Duke University Douglas Oldenburg University of British Columbia Stephen Billings, Leonard Pasion Laurens Beran Sky Research...means and covariances estimated for each class [5]. For this study, dipole polarizabilities were fit with a Pasion -Oldenburg parameterization of 8 −1...model for unexploded ordnance classification with EMI data,” IEEE Geosci. Remote Sensing Letters, vol. 4, pp. 629–633, 2007. [4] L. R. Pasion

  20. Social Inequality and Labor Force Participation.

    ERIC Educational Resources Information Center

    King, Jonathan

    The labor force participation rates of whites, blacks, and Spanish-Americans, grouped by sex, are explained in a linear regression model fitted with 1970 U. S. Census data on Standard Metropolitan Statistical Area (SMSA). The explanatory variables are: average age, average years of education, vocational training rate, disabled rate, unemployment…

  1. Multi-Parameter Linear Least-Squares Fitting to Poisson Data One Count at a Time

    NASA Technical Reports Server (NTRS)

    Wheaton, W.; Dunklee, A.; Jacobson, A.; Ling, J.; Mahoney, W.; Radocinski, R.

    1993-01-01

    A standard problem in gamma-ray astronomy data analysis is the decomposition of a set of observed counts, described by Poisson statistics, according to a given multi-component linear model, with underlying physical count rates or fluxes which are to be estimated from the data.

  2. Spillover in the Academy: Marriage Stability and Faculty Evaluations.

    ERIC Educational Resources Information Center

    Ludlow, Larry H.; Alvarez-Salvat, Rose M.

    2001-01-01

    Studied the spillover between family and work by examining the link between marital status and work performance across marriage, divorce, and remarriage. A polynomial regression model was fit to the data from 78 evaluations of an individual professor, and a cubic curve through the 3 periods was statistically significant. (SLD)

  3. Differential Item Functioning Analysis Using Rasch Item Information Functions

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Mapuranga, Raymond

    2009-01-01

    Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…

  4. Constructing the Exact Significance Level for a Person-Fit Statistic.

    ERIC Educational Resources Information Center

    Liou, Michelle; Chang, Chih-Hsin

    1992-01-01

    An extension is proposed for the network algorithm introduced by C.R. Mehta and N.R. Patel to construct exact tail probabilities for testing the general hypothesis that item responses are distributed according to the Rasch model. A simulation study indicates the efficiency of the algorithm. (SLD)

  5. Innovation Rather than Improvement: A Solvable High-Dimensional Model Highlights the Limitations of Scalar Fitness

    NASA Astrophysics Data System (ADS)

    Tikhonov, Mikhail; Monasson, Remi

    2018-01-01

    Much of our understanding of ecological and evolutionary mechanisms derives from analysis of low-dimensional models: with few interacting species, or few axes defining "fitness". It is not always clear to what extent the intuition derived from low-dimensional models applies to the complex, high-dimensional reality. For instance, most naturally occurring microbial communities are strikingly diverse, harboring a large number of coexisting species, each of which contributes to shaping the environment of others. Understanding the eco-evolutionary interplay in these systems is an important challenge, and an exciting new domain for statistical physics. Recent work identified a promising new platform for investigating highly diverse ecosystems, based on the classic resource competition model of MacArthur. Here, we describe how the same analytical framework can be used to study evolutionary questions. Our analysis illustrates how, at high dimension, the intuition promoted by a one-dimensional (scalar) notion of fitness can become misleading. Specifically, while the low-dimensional picture emphasizes organism cost or efficiency, we exhibit a regime where cost becomes irrelevant for survival, and link this observation to generic properties of high-dimensional geometry.

  6. CalFitter: a web server for analysis of protein thermal denaturation data.

    PubMed

    Mazurenko, Stanislav; Stourac, Jan; Kunka, Antonin; Nedeljkovic, Sava; Bednar, David; Prokop, Zbynek; Damborsky, Jiri

    2018-05-14

    Despite significant advances in the understanding of protein structure-function relationships, revealing protein folding pathways still poses a challenge due to a limited number of relevant experimental tools. Widely-used experimental techniques, such as calorimetry or spectroscopy, critically depend on a proper data analysis. Currently, there are only separate data analysis tools available for each type of experiment with a limited model selection. To address this problem, we have developed the CalFitter web server to be a unified platform for comprehensive data fitting and analysis of protein thermal denaturation data. The server allows simultaneous global data fitting using any combination of input data types and offers 12 protein unfolding pathway models for selection, including irreversible transitions often missing from other tools. The data fitting produces optimal parameter values, their confidence intervals, and statistical information to define unfolding pathways. The server provides an interactive and easy-to-use interface that allows users to directly analyse input datasets and simulate modelled output based on the model parameters. CalFitter web server is available free at https://loschmidt.chemi.muni.cz/calfitter/.

  7. Psychometric properties of the pain stages of change questionnaire as evaluated by rasch analysis in patients with chronic musculoskeletal pain

    PubMed Central

    2014-01-01

    Background Our objective was to evaluate the measurement properties of the Pain Stages of Change Questionnaire (PSOCQ) and its four subscales Precontemplation, Contemplation, Action and Maintenance. Methods A total of 231 patients, median age 42 years, with chronic musculoskeletal pain responded to the 30 items in PSOCQ. Thresholds for item scores, and unidimensionality and invariance of the PSOCQ and its four subscales were evaluated by Rasch analysis, partial credit model. Results The items had disordered threshold and needed to be rescored. The 30 items in the PSOCQ did not fit the Rasch model Chi- square item trait statistics. All subscales fitted the Rasch models. The associations to pain (11 point numeric rating scale), emotional distress (Hopkins symptom check list v 25) and self-efficacy (Arthritis Self-Efficacy Scale) were highest for the Precontemplation subscale. Conclusion The present analysis revealed that all four subscales in PSOCQ fitted the Rasch model. No common construct for all subscales were identified, but the Action and Maintenance subscales were closely related. PMID:24646065

  8. Scale construction utilising the Rasch unidimensional measurement model: A measurement of adolescent attitudes towards abortion.

    PubMed

    Hendriks, Jacqueline; Fyfe, Sue; Styles, Irene; Skinner, S Rachel; Merriman, Gareth

    2012-01-01

    Measurement scales seeking to quantify latent traits like attitudes, are often developed using traditional psychometric approaches. Application of the Rasch unidimensional measurement model may complement or replace these techniques, as the model can be used to construct scales and check their psychometric properties. If data fit the model, then a scale with invariant measurement properties, including interval-level scores, will have been developed. This paper highlights the unique properties of the Rasch model. Items developed to measure adolescent attitudes towards abortion are used to exemplify the process. Ten attitude and intention items relating to abortion were answered by 406 adolescents aged 12 to 19 years, as part of the "Teen Relationships Study". The sampling framework captured a range of sexual and pregnancy experiences. Items were assessed for fit to the Rasch model including checks for Differential Item Functioning (DIF) by gender, sexual experience or pregnancy experience. Rasch analysis of the original dataset initially demonstrated that some items did not fit the model. Rescoring of one item (B5) and removal of another (L31) resulted in fit, as shown by a non-significant item-trait interaction total chi-square and a mean log residual fit statistic for items of -0.05 (SD=1.43). No DIF existed for the revised scale. However, items did not distinguish as well amongst persons with the most intense attitudes as they did for other persons. A person separation index of 0.82 indicated good reliability. Application of the Rasch model produced a valid and reliable scale measuring adolescent attitudes towards abortion, with stable measurement properties. The Rasch process provided an extensive range of diagnostic information concerning item and person fit, enabling changes to be made to scale items. This example shows the value of the Rasch model in developing scales for both social science and health disciplines.

  9. Assessment of Mars Atmospheric Temperature Retrievals from the Thermal Emission Spectrometer Radiances

    NASA Technical Reports Server (NTRS)

    Hoffman, Matthew J.; Eluszkiewicz, Janusz; Weisenstein, Deborah; Uymin, Gennady; Moncet, Jean-Luc

    2012-01-01

    Motivated by the needs of Mars data assimilation. particularly quantification of measurement errors and generation of averaging kernels. we have evaluated atmospheric temperature retrievals from Mars Global Surveyor (MGS) Thermal Emission Spectrometer (TES) radiances. Multiple sets of retrievals have been considered in this study; (1) retrievals available from the Planetary Data System (PDS), (2) retrievals based on variants of the retrieval algorithm used to generate the PDS retrievals, and (3) retrievals produced using the Mars 1-Dimensional Retrieval (M1R) algorithm based on the Optimal Spectral Sampling (OSS ) forward model. The retrieved temperature profiles are compared to the MGS Radio Science (RS) temperature profiles. For the samples tested, the M1R temperature profiles can be made to agree within 2 K with the RS temperature profiles, but only after tuning the prior and error statistics. Use of a global prior that does not take into account the seasonal dependence leads errors of up 6 K. In polar samples. errors relative to the RS temperature profiles are even larger. In these samples, the PDS temperature profiles also exhibit a poor fit with RS temperatures. This fit is worse than reported in previous studies, indicating that the lack of fit is due to a bias correction to TES radiances implemented after 2004. To explain the differences between the PDS and Ml R temperatures, the algorithms are compared directly, with the OSS forward model inserted into the PDS algorithm. Factors such as the filtering parameter, the use of linear versus nonlinear constrained inversion, and the choice of the forward model, are found to contribute heavily to the differences in the temperature profiles retrieved in the polar regions, resulting in uncertainties of up to 6 K. Even outside the poles, changes in the a priori statistics result in different profile shapes which all fit the radiances within the specified error. The importance of the a priori statistics prevents reliable global retrievals based a single a priori and strongly implies that a robust science analysis must instead rely on retrievals employing localized a priori information, for example from an ensemble based data assimilation system such as the Local Ensemble Transform Kalman Filter (LETKF).

  10. Experimental study of water desorption isotherms and thin-layer convective drying kinetics of bay laurel leaves

    NASA Astrophysics Data System (ADS)

    Ghnimi, Thouraya; Hassini, Lamine; Bagane, Mohamed

    2016-12-01

    The aim of this work is to determine the desorption isotherms and the drying kinetics of bay laurel leaves ( Laurus Nobilis L.). The desorption isotherms were performed at three temperature levels: 50, 60 and 70 °C and at water activity ranging from 0.057 to 0.88 using the statistic gravimetric method. Five sorption models were used to fit desorption experimental isotherm data. It was found that Kuhn model offers the best fitting of experimental moisture isotherms in the mentioned investigated ranges of temperature and water activity. The Net isosteric heat of water desorption was evaluated using The Clausius-Clapeyron equation and was then best correlated to equilibrium moisture content by the empirical Tsami's equation. Thin layer convective drying curves of bay laurel leaves were obtained for temperatures of 45, 50, 60 and 70 °C, relative humidity of 5, 15, 30 and 45 % and air velocities of 1, 1.5 and 2 m/s. A non linear regression procedure of Levenberg-Marquardt was used to fit drying curves with five semi empirical mathematical models available in the literature, The R2 and χ2 were used to evaluate the goodness of fit of models to data. Based on the experimental drying curves the drying characteristic curve (DCC) has been established and fitted with a third degree polynomial function. It was found that the Midilli Kucuk model was the best semi-empirical model describing thin layer drying kinetics of bay laurel leaves. The bay laurel leaves effective moisture diffusivity and activation energy were also identified.

  11. The Impact of Information Culture on Patient Safety Outcomes. Development of a Structural Equation Model.

    PubMed

    Jylhä, Virpi; Mikkonen, Santtu; Saranto, Kaija; Bates, David W

    2017-03-08

    An organization's information culture and information management practices create conditions for processing patient information in hospitals. Information management incidents are failures that could lead to adverse events for the patient if they are not detected. To test a theoretical model that links information culture in acute care hospitals to information management incidents and patient safety outcomes. Reason's model for the stages of development of organizational accidents was applied. Study data were collected from a cross-sectional survey of 909 RNs who work in medical or surgical units at 32 acute care hospitals in Finland. Structural equation modeling was used to assess how well the hypothesized model fit the study data. Fit indices indicated a good fit for the model. In total, 18 of the 32 paths tested were statistically significant. Documentation errors had the strongest total effect on patient safety outcomes. Organizational guidance positively affected information availability and utilization of electronic patient records, whereas the latter had the strongest total effect on the reduction of information delays. Patient safety outcomes are associated with information management incidents and information culture. Further, the dimensions of the information culture create work conditions that generate errors in hospitals.

  12. A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

    PubMed

    Park, Jong Cook; Kim, Kwang Sig

    2012-03-01

    The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.

  13. A MODEL TO EVALUATE PAST EXPOSURE TO 2,3,7,8 ...

    EPA Pesticide Factsheets

    Data from several studies suggest that concentrations of dioxins rose in the environment from the 1930s to about the 1960s/70s and have been declining over the last decade or two. The most direct evidence of this trend comes from lake core sediments, which can be used to estimate past atmospheric depositions of dioxins. The primary source of human exposure to dioxins is through the food supply. The pathway relating atmospheric depositions to concentrations in food is quite complex, and accordingly, it is not known to what extent the trend in human exposure mirrors the trend in atmospheric depositions. This paper describes an attempt to statistically reconstruct the pattern of past human exposure to the most toxic dioxin congener, 2,3,7,8-TCDD (abbreviated TCDD), through use of a simple pharmacokinetic (PK) model which included a time-varying TCDD exposure dose. This PK model was fit to TCDD body burden data (i.e., TCDD concentrations in lipid) from five U.S. studies dating from 1972 to 1987 and covering a wide age range. A Bayesian statistical approach was used to fit TCDD exposure; model parameters other than exposure were all previously known or estimated from other data sources. The primary results of the analysis are as follows: 1.) use of a time-varying exposure dose provided a far better fit to the TCDD body burden data than did using a dose that was constant over time; this is strong evidence that exposure to TCDD has, in fact, varied during the

  14. A comparison of fit of CNC-milled titanium and zirconia frameworks to implants.

    PubMed

    Abduo, Jaafar; Lyons, Karl; Waddell, Neil; Bennani, Vincent; Swain, Michael

    2012-05-01

    Computer numeric controlled (CNC) milling was proven to be predictable method to fabricate accurately fitting implant titanium frameworks. However, no data are available regarding the fit of CNC-milled implant zirconia frameworks. To compare the precision of fit of implant frameworks milled from titanium and zirconia and relate it to peri-implant strain development after framework fixation. A partially edentulous epoxy resin models received two Branemark implants in the areas of the lower left second premolar and second molar. From this model, 10 identical frameworks were fabricated by mean of CNC milling. Half of them were made from titanium and the other half from zirconia. Strain gauges were mounted close to the implants to qualitatively and quantitatively assess strain development as a result of framework fitting. In addition, the fit of the framework implant interface was measured using an optical microscope, when only one screw was tightened (passive fit) and when all screws were tightened (vertical fit). The data was statistically analyzed using the Mann-Whitney test. All frameworks produced measurable amounts of peri-implant strain. The zirconia frameworks produced significantly less strain than titanium. Combining the qualitative and quantitative information indicates that the implants were under vertical displacement rather than horizontal. The vertical fit was similar for zirconia (3.7 µm) and titanium (3.6 µm) frameworks; however, the zirconia frameworks exhibited a significantly finer passive fit (5.5 µm) than titanium frameworks (13.6 µm). CNC milling produced zirconia and titanium frameworks with high accuracy. The difference between the two materials in terms of fit is expected to be of minimal clinical significance. The strain developed around the implants was more related to the framework fit rather than framework material. © 2011 Wiley Periodicals, Inc.

  15. Modeling Count Outcomes from HIV Risk Reduction Interventions: A Comparison of Competing Statistical Models for Count Responses

    PubMed Central

    Xia, Yinglin; Morrison-Beedy, Dianne; Ma, Jingming; Feng, Changyong; Cross, Wendi; Tu, Xin

    2012-01-01

    Modeling count data from sexual behavioral outcomes involves many challenges, especially when the data exhibit a preponderance of zeros and overdispersion. In particular, the popular Poisson log-linear model is not appropriate for modeling such outcomes. Although alternatives exist for addressing both issues, they are not widely and effectively used in sex health research, especially in HIV prevention intervention and related studies. In this paper, we discuss how to analyze count outcomes distributed with excess of zeros and overdispersion and introduce appropriate model-fit indices for comparing the performance of competing models, using data from a real study on HIV prevention intervention. The in-depth look at these common issues arising from studies involving behavioral outcomes will promote sound statistical analyses and facilitate research in this and other related areas. PMID:22536496

  16. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle

    PubMed Central

    Cho, C. I.; Alam, M.; Choi, T. J.; Choy, Y. H.; Choi, J. G.; Lee, S. S.; Cho, K. H.

    2016-01-01

    The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3–L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea. PMID:26954184

  17. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle.

    PubMed

    Cho, C I; Alam, M; Choi, T J; Choy, Y H; Choi, J G; Lee, S S; Cho, K H

    2016-05-01

    The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3-L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.

  18. Multimodel predictive system for carbon dioxide solubility in saline formation waters.

    PubMed

    Wang, Zan; Small, Mitchell J; Karamalidis, Athanasios K

    2013-02-05

    The prediction of carbon dioxide solubility in brine at conditions relevant to carbon sequestration (i.e., high temperature, pressure, and salt concentration (T-P-X)) is crucial when this technology is applied. Eleven mathematical models for predicting CO(2) solubility in brine are compared and considered for inclusion in a multimodel predictive system. Model goodness of fit is evaluated over the temperature range 304-433 K, pressure range 74-500 bar, and salt concentration range 0-7 m (NaCl equivalent), using 173 published CO(2) solubility measurements, particularly selected for those conditions. The performance of each model is assessed using various statistical methods, including the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Different models emerge as best fits for different subranges of the input conditions. A classification tree is generated using machine learning methods to predict the best-performing model under different T-P-X subranges, allowing development of a multimodel predictive system (MMoPS) that selects and applies the model expected to yield the most accurate CO(2) solubility prediction. Statistical analysis of the MMoPS predictions, including a stratified 5-fold cross validation, shows that MMoPS outperforms each individual model and increases the overall accuracy of CO(2) solubility prediction across the range of T-P-X conditions likely to be encountered in carbon sequestration applications.

  19. Chi-squared and C statistic minimization for low count per bin data

    NASA Astrophysics Data System (ADS)

    Nousek, John A.; Shue, David R.

    1989-07-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  20. Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy

    NASA Technical Reports Server (NTRS)

    Nousek, John A.; Shue, David R.

    1989-01-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  1. Local air temperature tolerance: a sensible basis for estimating climate variability

    NASA Astrophysics Data System (ADS)

    Kärner, Olavi; Post, Piia

    2016-11-01

    The customary representation of climate using sample moments is generally biased due to the noticeably nonstationary behaviour of many climate series. In this study, we introduce a moment-free climate representation based on a statistical model fitted to a long-term daily air temperature anomaly series. This model allows us to separate the climate and weather scale variability in the series. As a result, the climate scale can be characterized using the mean annual cycle of series and local air temperature tolerance, where the latter is computed using the fitted model. The representation of weather scale variability is specified using the frequency and the range of outliers based on the tolerance. The scheme is illustrated using five long-term air temperature records observed by different European meteorological stations.

  2. Evaluation of internal fit of interim crown fabricated with CAD/CAM milling and 3D printing system.

    PubMed

    Lee, Wan-Sun; Lee, Du-Hyeong; Lee, Kyu-Bok

    2017-08-01

    This study is to evaluate the internal fit of the crown manufactured by CAD/CAM milling method and 3D printing method. The master model was fabricated with stainless steel by using CNC machine and the work model was created from the vinyl-polysiloxane impression. After scanning the working model, the design software is used to design the crown. The saved STL file is used on the CAD/CAM milling method and two types of 3D printing method to produce 10 interim crowns per group. Internal discrepancy measurement uses the silicon replica method and the measured data are analyzed with One-way ANOVA to verify the statistic significance. The discrepancy means (standard deviation) of the 3 groups are 171.6 (97.4) µm for the crown manufactured by the milling system and 149.1 (65.9) and 91.1 (36.4) µm, respectively, for the crowns manufactured with the two types of 3D printing system. There was a statistically significant difference and the 3D printing system group showed more outstanding value than the milling system group. The marginal and internal fit of the interim restoration has more outstanding 3D printing method than the CAD/CAM milling method. Therefore, the 3D printing method is considered as applicable for not only the interim restoration production, but also in the dental prosthesis production with a higher level of completion.

  3. An integrated analysis of phenotypic selection on insect body size and development time.

    PubMed

    Eck, Daniel J; Shaw, Ruth G; Geyer, Charles J; Kingsolver, Joel G

    2015-09-01

    Most studies of phenotypic selection do not estimate selection or fitness surfaces for multiple components of fitness within a unified statistical framework. This makes it difficult or impossible to assess how selection operates on traits through variation in multiple components of fitness. We describe a new generation of aster models that can evaluate phenotypic selection by accounting for timing of life-history transitions and their effect on population growth rate, in addition to survival and reproductive output. We use this approach to estimate selection on body size and development time for a field population of the herbivorous insect, Manduca sexta (Lepidoptera: Sphingidae). Estimated fitness surfaces revealed strong and significant directional selection favoring both larger adult size (via effects on egg counts) and more rapid rates of early larval development (via effects on larval survival). Incorporating the timing of reproduction and its influence on population growth rate into the analysis resulted in larger values for size in early larval development at which fitness is maximized, and weaker selection on size in early larval development. These results illustrate how the interplay of different components of fitness can influence selection on size and development time. This integrated modeling framework can be readily applied to studies of phenotypic selection via multiple fitness components in other systems. © 2015 The Author(s). Evolution © 2015 The Society for the Study of Evolution.

  4. Developing GIS-based eastern equine encephalitis vector-host models in Tuskegee, Alabama.

    PubMed

    Jacob, Benjamin G; Burkett-Cadena, Nathan D; Luvall, Jeffrey C; Parcak, Sarah H; McClure, Christopher J W; Estep, Laura K; Hill, Geoffrey E; Cupp, Eddie W; Novak, Robert J; Unnasch, Thomas R

    2010-02-24

    A site near Tuskegee, Alabama was examined for vector-host activities of eastern equine encephalomyelitis virus (EEEV). Land cover maps of the study site were created in ArcInfo 9.2 from QuickBird data encompassing visible and near-infrared (NIR) band information (0.45 to 0.72 microm) acquired July 15, 2008. Georeferenced mosquito and bird sampling sites, and their associated land cover attributes from the study site, were overlaid onto the satellite data. SAS 9.1.4 was used to explore univariate statistics and to generate regression models using the field and remote-sampled mosquito and bird data. Regression models indicated that Culex erracticus and Northern Cardinals were the most abundant mosquito and bird species, respectively. Spatial linear prediction models were then generated in Geostatistical Analyst Extension of ArcGIS 9.2. Additionally, a model of the study site was generated, based on a Digital Elevation Model (DEM), using ArcScene extension of ArcGIS 9.2. For total mosquito count data, a first-order trend ordinary kriging process was fitted to the semivariogram at a partial sill of 5.041 km, nugget of 6.325 km, lag size of 7.076 km, and range of 31.43 km, using 12 lags. For total adult Cx. erracticus count, a first-order trend ordinary kriging process was fitted to the semivariogram at a partial sill of 5.764 km, nugget of 6.114 km, lag size of 7.472 km, and range of 32.62 km, using 12 lags. For the total bird count data, a first-order trend ordinary kriging process was fitted to the semivariogram at a partial sill of 4.998 km, nugget of 5.413 km, lag size of 7.549 km and range of 35.27 km, using 12 lags. For the Northern Cardinal count data, a first-order trend ordinary kriging process was fitted to the semivariogram at a partial sill of 6.387 km, nugget of 5.935 km, lag size of 8.549 km and a range of 41.38 km, using 12 lags. Results of the DEM analyses indicated a statistically significant inverse linear relationship between total sampled mosquito data and elevation (R2 = -.4262; p < .0001), with a standard deviation (SD) of 10.46, and total sampled bird data and elevation (R2 = -.5111; p < .0001), with a SD of 22.97. DEM statistics also indicated a significant inverse linear relationship between total sampled Cx. erracticus data and elevation (R2 = -.4711; p < .0001), with a SD of 11.16, and the total sampled Northern Cardinal data and elevation (R2 = -.5831; p < .0001), SD of 11.42. These data demonstrate that GIS/remote sensing models and spatial statistics can capture space-varying functional relationships between field-sampled mosquito and bird parameters for determining risk for EEEV transmission.

  5. Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools 2)

    DTIC Science & Technology

    2013-09-30

    proceedings of a recent conference on The Effects of Noise on Aquatic Life (Schick et al. 2014). In addition to this work, Schick has been working...Lisbon, Portugal (April), the UK National Centre for Statistical Ecology annual workshop (June), and the Effects of Aquatic Noise conference (August). We...A. N. Popper and A. Hawkins, editors. Effects of Noise on Aquatic Life II. Springer. [in press] Fleishman, E., M. Burgman, M. C. Runge, R. S

  6. Bayesian structural equation modeling in sport and exercise psychology.

    PubMed

    Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus

    2015-08-01

    Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.

  7. Disproportionation of rosin on an industrial Pd/C catalyst: reaction pathway and kinetic model discrimination.

    PubMed

    Souto, Juan Carlos; Yustos, Pedro; Ladero, Miguel; Garcia-Ochoa, Felix

    2011-02-01

    In this work, a phenomenological study of the isomerisation and disproportionation of rosin acids using an industrial 5% Pd on charcoal catalyst from 200 to 240°C is carried out. Medium composition is determined by elemental microanalysis, GC-MS and GC-FID. Dehydrogenated and hydrogenated acid species molar amounts in the final product show that dehydrogenation is the main reaction. Moreover, both hydrogen and non-hydrogen concentration considering kinetic models are fitted to experimental data using a multivariable non-linear technique. Statistical discrimination among the proposed kinetic models lead to the conclusion hydrogen considering models fit much better to experimental results. The final kinetic model involves first-order isomerisation reactions of neoabietic and palustric acids to abietic acid, first-order dehydrogenation and hydrogenation of this latter acid, and hydrogenation of pimaric acids. Hydrogenation reactions are partial first-order regarding the acid and hydrogen. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jesus, J.F.; Valentim, R.; Andrade-Oliveira, F., E-mail: jfjesus@itapeva.unesp.br, E-mail: valentim.rodolfo@unifesp.br, E-mail: felipe.oliveira@port.ac.uk

    Creation of Cold Dark Matter (CCDM), in the context of Einstein Field Equations, produces a negative pressure term which can be used to explain the accelerated expansion of the Universe. In this work we tested six different spatially flat models for matter creation using statistical criteria, in light of SNe Ia data: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Bayesian Evidence (BE). These criteria allow to compare models considering goodness of fit and number of free parameters, penalizing excess of complexity. We find that JO model is slightly favoured over LJO/ΛCDM model, however, neither of these, nor Γmore » = 3α H {sub 0} model can be discarded from the current analysis. Three other scenarios are discarded either because poor fitting or because of the excess of free parameters. A method of increasing Bayesian evidence through reparameterization in order to reducing parameter degeneracy is also developed.« less

  9. Extracting Models in Single Molecule Experiments

    NASA Astrophysics Data System (ADS)

    Presse, Steve

    2013-03-01

    Single molecule experiments can now monitor the journey of a protein from its assembly near a ribosome to its proteolytic demise. Ideally all single molecule data should be self-explanatory. However data originating from single molecule experiments is particularly challenging to interpret on account of fluctuations and noise at such small scales. Realistically, basic understanding comes from models carefully extracted from the noisy data. Statistical mechanics, and maximum entropy in particular, provide a powerful framework for accomplishing this task in a principled fashion. Here I will discuss our work in extracting conformational memory from single molecule force spectroscopy experiments on large biomolecules. One clear advantage of this method is that we let the data tend towards the correct model, we do not fit the data. I will show that the dynamical model of the single molecule dynamics which emerges from this analysis is often more textured and complex than could otherwise come from fitting the data to a pre-conceived model.

  10. Uncertainty quantification for optical model parameters

    DOE PAGES

    Lovell, A. E.; Nunes, F. M.; Sarich, J.; ...

    2017-02-21

    Although uncertainty quantification has been making its way into nuclear theory, these methods have yet to be explored in the context of reaction theory. For example, it is well known that different parameterizations of the optical potential can result in different cross sections, but these differences have not been systematically studied and quantified. The purpose of our work is to investigate the uncertainties in nuclear reactions that result from fitting a given model to elastic-scattering data, as well as to study how these uncertainties propagate to the inelastic and transfer channels. We use statistical methods to determine a best fitmore » and create corresponding 95% confidence bands. A simple model of the process is fit to elastic-scattering data and used to predict either inelastic or transfer cross sections. In this initial work, we assume that our model is correct, and the only uncertainties come from the variation of the fit parameters. Here, we study a number of reactions involving neutron and deuteron projectiles with energies in the range of 5–25 MeV/u, on targets with mass A=12–208. We investigate the correlations between the parameters in the fit. The case of deuterons on 12C is discussed in detail: the elastic-scattering fit and the prediction of 12C(d,p) 13C transfer angular distributions, using both uncorrelated and correlated χ 2 minimization functions. The general features for all cases are compiled in a systematic manner to identify trends. This work shows that, in many cases, the correlated χ 2 functions (in comparison to the uncorrelated χ 2 functions) provide a more natural parameterization of the process. These correlated functions do, however, produce broader confidence bands. Further optimization may require improvement in the models themselves and/or more information included in the fit.« less

  11. Diffusion weighted imaging in patients with rectal cancer: Comparison between Gaussian and non-Gaussian models

    PubMed Central

    Marias, Kostas; Lambregts, Doenja M. J.; Nikiforaki, Katerina; van Heeswijk, Miriam M.; Bakers, Frans C. H.; Beets-Tan, Regina G. H.

    2017-01-01

    Purpose The purpose of this study was to compare the performance of four diffusion models, including mono and bi-exponential both Gaussian and non-Gaussian models, in diffusion weighted imaging of rectal cancer. Material and methods Nineteen patients with rectal adenocarcinoma underwent MRI examination of the rectum before chemoradiation therapy including a 7 b-value diffusion sequence (0, 25, 50, 100, 500, 1000 and 2000 s/mm2) at a 1.5T scanner. Four different diffusion models including mono- and bi-exponential Gaussian (MG and BG) and non-Gaussian (MNG and BNG) were applied on whole tumor volumes of interest. Two different statistical criteria were recruited to assess their fitting performance, including the adjusted-R2 and Root Mean Square Error (RMSE). To decide which model better characterizes rectal cancer, model selection was relied on Akaike Information Criteria (AIC) and F-ratio. Results All candidate models achieved a good fitting performance with the two most complex models, the BG and the BNG, exhibiting the best fitting performance. However, both criteria for model selection indicated that the MG model performed better than any other model. In particular, using AIC Weights and F-ratio, the pixel-based analysis demonstrated that tumor areas better described by the simplest MG model in an average area of 53% and 33%, respectively. Non-Gaussian behavior was illustrated in an average area of 37% according to the F-ratio, and 7% using AIC Weights. However, the distributions of the pixels best fitted by each of the four models suggest that MG failed to perform better than any other model in all patients, and the overall tumor area. Conclusion No single diffusion model evaluated herein could accurately describe rectal tumours. These findings probably can be explained on the basis of increased tumour heterogeneity, where areas with high vascularity could be fitted better with bi-exponential models, and areas with necrosis would mostly follow mono-exponential behavior. PMID:28863161

  12. Diffusion weighted imaging in patients with rectal cancer: Comparison between Gaussian and non-Gaussian models.

    PubMed

    Manikis, Georgios C; Marias, Kostas; Lambregts, Doenja M J; Nikiforaki, Katerina; van Heeswijk, Miriam M; Bakers, Frans C H; Beets-Tan, Regina G H; Papanikolaou, Nikolaos

    2017-01-01

    The purpose of this study was to compare the performance of four diffusion models, including mono and bi-exponential both Gaussian and non-Gaussian models, in diffusion weighted imaging of rectal cancer. Nineteen patients with rectal adenocarcinoma underwent MRI examination of the rectum before chemoradiation therapy including a 7 b-value diffusion sequence (0, 25, 50, 100, 500, 1000 and 2000 s/mm2) at a 1.5T scanner. Four different diffusion models including mono- and bi-exponential Gaussian (MG and BG) and non-Gaussian (MNG and BNG) were applied on whole tumor volumes of interest. Two different statistical criteria were recruited to assess their fitting performance, including the adjusted-R2 and Root Mean Square Error (RMSE). To decide which model better characterizes rectal cancer, model selection was relied on Akaike Information Criteria (AIC) and F-ratio. All candidate models achieved a good fitting performance with the two most complex models, the BG and the BNG, exhibiting the best fitting performance. However, both criteria for model selection indicated that the MG model performed better than any other model. In particular, using AIC Weights and F-ratio, the pixel-based analysis demonstrated that tumor areas better described by the simplest MG model in an average area of 53% and 33%, respectively. Non-Gaussian behavior was illustrated in an average area of 37% according to the F-ratio, and 7% using AIC Weights. However, the distributions of the pixels best fitted by each of the four models suggest that MG failed to perform better than any other model in all patients, and the overall tumor area. No single diffusion model evaluated herein could accurately describe rectal tumours. These findings probably can be explained on the basis of increased tumour heterogeneity, where areas with high vascularity could be fitted better with bi-exponential models, and areas with necrosis would mostly follow mono-exponential behavior.

  13. Quantifying (dis)agreement between direct detection experiments in a halo-independent way

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feldstein, Brian; Kahlhoefer, Felix, E-mail: brian.feldstein@physics.ox.ac.uk, E-mail: felix.kahlhoefer@physics.ox.ac.uk

    We propose an improved method to study recent and near-future dark matter direct detection experiments with small numbers of observed events. Our method determines in a quantitative and halo-independent way whether the experiments point towards a consistent dark matter signal and identifies the best-fit dark matter parameters. To achieve true halo independence, we apply a recently developed method based on finding the velocity distribution that best describes a given set of data. For a quantitative global analysis we construct a likelihood function suitable for small numbers of events, which allows us to determine the best-fit particle physics properties of darkmore » matter considering all experiments simultaneously. Based on this likelihood function we propose a new test statistic that quantifies how well the proposed model fits the data and how large the tension between different direct detection experiments is. We perform Monte Carlo simulations in order to determine the probability distribution function of this test statistic and to calculate the p-value for both the dark matter hypothesis and the background-only hypothesis.« less

  14. Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

    USGS Publications Warehouse

    Dorazio, Robert; Hunter, Margaret

    2015-01-01

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  15. Comparison of Artificial Neural Networks and ARIMA statistical models in simulations of target wind time series

    NASA Astrophysics Data System (ADS)

    Kolokythas, Kostantinos; Vasileios, Salamalikis; Athanassios, Argiriou; Kazantzidis, Andreas

    2015-04-01

    The wind is a result of complex interactions of numerous mechanisms taking place in small or large scales, so, the better knowledge of its behavior is essential in a variety of applications, especially in the field of power production coming from wind turbines. In the literature there is a considerable number of models, either physical or statistical ones, dealing with the problem of simulation and prediction of wind speed. Among others, Artificial Neural Networks (ANNs) are widely used for the purpose of wind forecasting and, in the great majority of cases, outperform other conventional statistical models. In this study, a number of ANNs with different architectures, which have been created and applied in a dataset of wind time series, are compared to Auto Regressive Integrated Moving Average (ARIMA) statistical models. The data consist of mean hourly wind speeds coming from a wind farm on a hilly Greek region and cover a period of one year (2013). The main goal is to evaluate the models ability to simulate successfully the wind speed at a significant point (target). Goodness-of-fit statistics are performed for the comparison of the different methods. In general, the ANN showed the best performance in the estimation of wind speed prevailing over the ARIMA models.

  16. Assessing risk factors for dental caries: a statistical modeling approach.

    PubMed

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  17. Parallel algorithm for solving Kepler’s equation on Graphics Processing Units: Application to analysis of Doppler exoplanet searches

    NASA Astrophysics Data System (ADS)

    Ford, Eric B.

    2009-05-01

    We present the results of a highly parallel Kepler equation solver using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX and the "Compute Unified Device Architecture" (CUDA) programming environment. We apply this to evaluate a goodness-of-fit statistic (e.g., χ2) for Doppler observations of stars potentially harboring multiple planetary companions (assuming negligible planet-planet interactions). Given the high-dimensionality of the model parameter space (at least five dimensions per planet), a global search is extremely computationally demanding. We expect that the underlying Kepler solver and model evaluator will be combined with a wide variety of more sophisticated algorithms to provide efficient global search, parameter estimation, model comparison, and adaptive experimental design for radial velocity and/or astrometric planet searches. We tested multiple implementations using single precision, double precision, pairs of single precision, and mixed precision arithmetic. We find that the vast majority of computations can be performed using single precision arithmetic, with selective use of compensated summation for increased precision. However, standard single precision is not adequate for calculating the mean anomaly from the time of observation and orbital period when evaluating the goodness-of-fit for real planetary systems and observational data sets. Using all double precision, our GPU code outperforms a similar code using a modern CPU by a factor of over 60. Using mixed precision, our GPU code provides a speed-up factor of over 600, when evaluating nsys > 1024 models planetary systems each containing npl = 4 planets and assuming nobs = 256 observations of each system. We conclude that modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's equation and a goodness-of-fit statistic for orbital models when presented with a large parameter space.

  18. Weighted regularized statistical shape space projection for breast 3D model reconstruction.

    PubMed

    Ruiz, Guillermo; Ramon, Eduard; García, Jaime; Sukno, Federico M; Ballester, Miguel A González

    2018-07-01

    The use of 3D imaging has increased as a practical and useful tool for plastic and aesthetic surgery planning. Specifically, the possibility of representing the patient breast anatomy in a 3D shape and simulate aesthetic or plastic procedures is a great tool for communication between surgeon and patient during surgery planning. For the purpose of obtaining the specific 3D model of the breast of a patient, model-based reconstruction methods can be used. In particular, 3D morphable models (3DMM) are a robust and widely used method to perform 3D reconstruction. However, if additional prior information (i.e., known landmarks) is combined with the 3DMM statistical model, shape constraints can be imposed to improve the 3DMM fitting accuracy. In this paper, we present a framework to fit a 3DMM of the breast to two possible inputs: 2D photos and 3D point clouds (scans). Our method consists in a Weighted Regularized (WR) projection into the shape space. The contribution of each point in the 3DMM shape is weighted allowing to assign more relevance to those points that we want to impose as constraints. Our method is applied at multiple stages of the 3D reconstruction process. Firstly, it can be used to obtain a 3DMM initialization from a sparse set of 3D points. Additionally, we embed our method in the 3DMM fitting process in which more reliable or already known 3D points or regions of points, can be weighted in order to preserve their shape information. The proposed method has been tested in two different input settings: scans and 2D pictures assessing both reconstruction frameworks with very positive results. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 1: Theory

    NASA Astrophysics Data System (ADS)

    Sundberg, R.; Moberg, A.; Hind, A.

    2012-08-01

    A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

  20. Protein Dynamics from NMR: The Slowly Relaxing Local Structure Analysis Compared with Model-Free Analysis

    PubMed Central

    Meirovitch, Eva; Shapiro, Yury E.; Polimeno, Antonino; Freed, Jack H.

    2009-01-01

    15N-1H spin relaxation is a powerful method for deriving information on protein dynamics. The traditional method of data analysis is model-free (MF), where the global and local N-H motions are independent and the local geometry is simplified. The common MF analysis consists of fitting single-field data. The results are typically field-dependent, and multi-field data cannot be fit with standard fitting schemes. Cases where known functional dynamics has not been detected by MF were identified by us and others. Recently we applied to spin relaxation in proteins the Slowly Relaxing Local Structure (SRLS) approach which accounts rigorously for mode-mixing and general features of local geometry. SRLS was shown to yield MF in appropriate asymptotic limits. We found that the experimental spectral density corresponds quite well to the SRLS spectral density. The MF formulae are often used outside of their validity ranges, allowing small data sets to be force-fitted with good statistics but inaccurate best-fit parameters. This paper focuses on the mechanism of force-fitting and its implications. It is shown that MF force-fits the experimental data because mode-mixing, the rhombic symmetry of the local ordering and general features of local geometry are not accounted for. Combined multi-field multi-temperature data analyzed by MF may lead to the detection of incorrect phenomena, while conformational entropy derived from MF order parameters may be highly inaccurate. On the other hand, fitting to more appropriate models can yield consistent physically insightful information. This requires that the complexity of the theoretical spectral densities matches the integrity of the experimental data. As shown herein, the SRLS densities comply with this requirement. PMID:16821820

Top