Sample records for generalized statistical models

  1. Modified Likelihood-Based Item Fit Statistics for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.

    2008-01-01

    Orlando and Thissen (2000) developed an item fit statistic for binary item response theory (IRT) models known as S-X[superscript 2]. This article generalizes their statistic to polytomous unfolding models. Four alternative formulations of S-X[superscript 2] are developed for the generalized graded unfolding model (GGUM). The GGUM is a…

  2. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  3. Multiple commodities in statistical microeconomics: Model and market

    NASA Astrophysics Data System (ADS)

    Baaquie, Belal E.; Yu, Miao; Du, Xin

    2016-11-01

    A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.

  4. An Analysis of the Navy’s Voluntary Education Program

    DTIC Science & Technology

    2007-03-01

    NAVAL ANALYSIS VOLED STUDY .........11 1. Data .........................................11 2. Statistical Models ...........................12 3...B. EMPLOYER FINANCED GENERAL TRAINING ................31 1. Data .........................................32 2. Statistical Model...37 1. Data .........................................38 2. Statistical Model ............................38 3. Findings

  5. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    USGS Publications Warehouse

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  6. Evaluation of airborne lidar data to predict vegetation Presence/Absence

    USGS Publications Warehouse

    Palaseanu-Lovejoy, M.; Nayegandhi, A.; Brock, J.; Woodman, R.; Wright, C.W.

    2009-01-01

    This study evaluates the capabilities of the Experimental Advanced Airborne Research Lidar (EAARL) in delineating vegetation assemblages in Jean Lafitte National Park, Louisiana. Five-meter-resolution grids of bare earth, canopy height, canopy-reflection ratio, and height of median energy were derived from EAARL data acquired in September 2006. Ground-truth data were collected along transects to assess species composition, canopy cover, and ground cover. To decide which model is more accurate, comparisons of general linear models and generalized additive models were conducted using conventional evaluation methods (i.e., sensitivity, specificity, Kappa statistics, and area under the curve) and two new indexes, net reclassification improvement and integrated discrimination improvement. Generalized additive models were superior to general linear models in modeling presence/absence in training vegetation categories, but no statistically significant differences between the two models were achieved in determining the classification accuracy at validation locations using conventional evaluation methods, although statistically significant improvements in net reclassifications were observed. ?? 2009 Coastal Education and Research Foundation.

  7. Analyzing longitudinal data with the linear mixed models procedure in SPSS.

    PubMed

    West, Brady T

    2009-09-01

    Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.

  8. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, Antoine; Edwards, T.C.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  9. Numerical and Qualitative Contrasts of Two Statistical Models for Water Quality Change in Tidal Waters

    EPA Science Inventory

    Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and...

  10. Central Limit Theorem for Exponentially Quasi-local Statistics of Spin Models on Cayley Graphs

    NASA Astrophysics Data System (ADS)

    Reddy, Tulasi Ram; Vadlamani, Sreekar; Yogeshwaran, D.

    2018-04-01

    Central limit theorems for linear statistics of lattice random fields (including spin models) are usually proven under suitable mixing conditions or quasi-associativity. Many interesting examples of spin models do not satisfy mixing conditions, and on the other hand, it does not seem easy to show central limit theorem for local statistics via quasi-associativity. In this work, we prove general central limit theorems for local statistics and exponentially quasi-local statistics of spin models on discrete Cayley graphs with polynomial growth. Further, we supplement these results by proving similar central limit theorems for random fields on discrete Cayley graphs taking values in a countable space, but under the stronger assumptions of α -mixing (for local statistics) and exponential α -mixing (for exponentially quasi-local statistics). All our central limit theorems assume a suitable variance lower bound like many others in the literature. We illustrate our general central limit theorem with specific examples of lattice spin models and statistics arising in computational topology, statistical physics and random networks. Examples of clustering spin models include quasi-associated spin models with fast decaying covariances like the off-critical Ising model, level sets of Gaussian random fields with fast decaying covariances like the massive Gaussian free field and determinantal point processes with fast decaying kernels. Examples of local statistics include intrinsic volumes, face counts, component counts of random cubical complexes while exponentially quasi-local statistics include nearest neighbour distances in spin models and Betti numbers of sub-critical random cubical complexes.

  11. Relevance of the c-statistic when evaluating risk-adjustment models in surgery.

    PubMed

    Merkow, Ryan P; Hall, Bruce L; Cohen, Mark E; Dimick, Justin B; Wang, Edward; Chow, Warren B; Ko, Clifford Y; Bilimoria, Karl Y

    2012-05-01

    The measurement of hospital quality based on outcomes requires risk adjustment. The c-statistic is a popular tool used to judge model performance, but can be limited, particularly when evaluating specific operations in focused populations. Our objectives were to examine the interpretation and relevance of the c-statistic when used in models with increasingly similar case mix and to consider an alternative perspective on model calibration based on a graphical depiction of model fit. From the American College of Surgeons National Surgical Quality Improvement Program (2008-2009), patients were identified who underwent a general surgery procedure, and procedure groups were increasingly restricted: colorectal-all, colorectal-elective cases only, and colorectal-elective cancer cases only. Mortality and serious morbidity outcomes were evaluated using logistic regression-based risk adjustment, and model c-statistics and calibration curves were used to compare model performance. During the study period, 323,427 general, 47,605 colorectal-all, 39,860 colorectal-elective, and 21,680 colorectal cancer patients were studied. Mortality ranged from 1.0% in general surgery to 4.1% in the colorectal-all group, and serious morbidity ranged from 3.9% in general surgery to 12.4% in the colorectal-all procedural group. As case mix was restricted, c-statistics progressively declined from the general to the colorectal cancer surgery cohorts for both mortality and serious morbidity (mortality: 0.949 to 0.866; serious morbidity: 0.861 to 0.668). Calibration was evaluated graphically by examining predicted vs observed number of events over risk deciles. For both mortality and serious morbidity, there was no qualitative difference in calibration identified between the procedure groups. In the present study, we demonstrate how the c-statistic can become less informative and, in certain circumstances, can lead to incorrect model-based conclusions, as case mix is restricted and patients become more homogenous. Although it remains an important tool, caution is advised when the c-statistic is advanced as the sole measure of a model performance. Copyright © 2012 American College of Surgeons. All rights reserved.

  12. A General Model for Estimating and Correcting the Effects of Nonindependence in Meta-Analysis.

    ERIC Educational Resources Information Center

    Strube, Michael J.

    A general model is described which can be used to represent the four common types of meta-analysis: (1) estimation of effect size by combining study outcomes; (2) estimation of effect size by contrasting study outcomes; (3) estimation of statistical significance by combining study outcomes; and (4) estimation of statistical significance by…

  13. The crossing statistic: dealing with unknown errors in the dispersion of Type Ia supernovae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman; Clifton, Timothy; Ferreira, Pedro, E-mail: arman@ewha.ac.kr, E-mail: tclifton@astro.ox.ac.uk, E-mail: p.ferreira1@physics.ox.ac.uk

    2011-08-01

    We propose a new statistic that has been designed to be used in situations where the intrinsic dispersion of a data set is not well known: The Crossing Statistic. This statistic is in general less sensitive than χ{sup 2} to the intrinsic dispersion of the data, and hence allows us to make progress in distinguishing between different models using goodness of fit to the data even when the errors involved are poorly understood. The proposed statistic makes use of the shape and trends of a model's predictions in a quantifiable manner. It is applicable to a variety of circumstances, althoughmore » we consider it to be especially well suited to the task of distinguishing between different cosmological models using type Ia supernovae. We show that this statistic can easily distinguish between different models in cases where the χ{sup 2} statistic fails. We also show that the last mode of the Crossing Statistic is identical to χ{sup 2}, so that it can be considered as a generalization of χ{sup 2}.« less

  14. Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

    PubMed

    Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

    2016-05-01

    Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.

  15. A generalized statistical model for the size distribution of wealth

    NASA Astrophysics Data System (ADS)

    Clementi, F.; Gallegati, M.; Kaniadakis, G.

    2012-12-01

    In a recent paper in this journal (Clementi et al 2009 J. Stat. Mech. P02037), we proposed a new, physically motivated, distribution function for modeling individual incomes, having its roots in the framework of the κ-generalized statistical mechanics. The performance of the κ-generalized distribution was checked against real data on personal income for the United States in 2003. In this paper we extend our previous model so as to be able to account for the distribution of wealth. Probabilistic functions and inequality measures of this generalized model for wealth distribution are obtained in closed form. In order to check the validity of the proposed model, we analyze the US household wealth distributions from 1984 to 2009 and conclude an excellent agreement with the data that is superior to any other model already known in the literature.

  16. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  17. Zubarev's Nonequilibrium Statistical Operator Method in the Generalized Statistics of Multiparticle Systems

    NASA Astrophysics Data System (ADS)

    Glushak, P. A.; Markiv, B. B.; Tokarchuk, M. V.

    2018-01-01

    We present a generalization of Zubarev's nonequilibrium statistical operator method based on the principle of maximum Renyi entropy. In the framework of this approach, we obtain transport equations for the basic set of parameters of the reduced description of nonequilibrium processes in a classical system of interacting particles using Liouville equations with fractional derivatives. For a classical systems of particles in a medium with a fractal structure, we obtain a non-Markovian diffusion equation with fractional spatial derivatives. For a concrete model of the frequency dependence of a memory function, we obtain generalized Kettano-type diffusion equation with the spatial and temporal fractality taken into account. We present a generalization of nonequilibrium thermofield dynamics in Zubarev's nonequilibrium statistical operator method in the framework of Renyi statistics.

  18. Control Theory and Statistical Generalizations.

    ERIC Educational Resources Information Center

    Powers, William T.

    1990-01-01

    Contrasts modeling methods in control theory to the methods of statistical generalizations in empirical studies of human or animal behavior. Presents a computer simulation that predicts behavior based on variables (effort and rewards) determined by the invariable (desired reward). Argues that control theory methods better reflect relationships to…

  19. The Development of Web-based Graphical User Interface for Unified Modeling Data with Multi (Correlated) Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian

    2018-04-01

    Statistical models have been developed rapidly into various directions to accommodate various types of data. Data collected from longitudinal, repeated measured, clustered data (either continuous, binary, count, or ordinal), are more likely to be correlated. Therefore statistical model for independent responses, such as Generalized Linear Model (GLM), Generalized Additive Model (GAM) are not appropriate. There are several models available to apply for correlated responses including GEEs (Generalized Estimating Equations), for marginal model and various mixed effect model such as GLMM (Generalized Linear Mixed Models) and HGLM (Hierarchical Generalized Linear Models) for subject spesific models. These models are available on free open source software R, but they can only be accessed through command line interface (using scrit). On the othe hand, most practical researchers very much rely on menu based or Graphical User Interface (GUI). We develop, using Shiny framework, standard pull down menu Web-GUI that unifies most models for correlated responses. The Web-GUI has accomodated almost all needed features. It enables users to do and compare various modeling for repeated measure data (GEE, GLMM, HGLM, GEE for nominal responses) much more easily trough online menus. This paper discusses the features of the Web-GUI and illustrates the use of them. In General we find that GEE, GLMM, HGLM gave very closed results.

  20. Prediction of the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors.

    PubMed

    Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro

    2014-01-01

    The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.

  1. Statistical inference for template aging

    NASA Astrophysics Data System (ADS)

    Schuckers, Michael E.

    2006-04-01

    A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.

  2. Assessment of corneal properties based on statistical modeling of OCT speckle.

    PubMed

    Jesus, Danilo A; Iskander, D Robert

    2017-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.

  3. Statistical study of air pollutant concentrations via generalized gamma distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marani, A.; Lavagnini, I.; Buttazzoni, C.

    1986-11-01

    This paper deals with modeling observed frequency distributions of air quality data measured in the area of Venice, Italy. The paper discusses the application of the generalized gamma distribution (ggd) which has not been commonly applied to air quality data notwithstanding the fact that it embodies most distribution models used for air quality analyses. The approach yields important simplifications for statistical analyses. A comparison among the ggd and other relevant models (standard gamma, Weibull, lognormal), carried out on daily sulfur dioxide concentrations in the area of Venice underlines the efficiency of ggd models in portraying experimental data.

  4. Beam wandering statistics of twin thin laser beam propagation under generalized atmospheric conditions.

    PubMed

    Pérez, Darío G; Funes, Gustavo

    2012-12-03

    Under the Geometrics Optics approximation is possible to estimate the covariance between the displacements of two thin beams after they have propagated through a turbulent medium. Previous works have concentrated in long propagation distances to provide models for the wandering statistics. These models are useful when the separation between beams is smaller than the propagation path-regardless of the characteristics scales of the turbulence. In this work we give a complete model for these covariances, behavior introducing absolute limits to the validity of former approximations. Moreover, these generalizations are established for non-Kolmogorov atmospheric models.

  5. Noninformative prior in the quantum statistical model of pure states

    NASA Astrophysics Data System (ADS)

    Tanaka, Fuyuhiko

    2012-06-01

    In the present paper, we consider a suitable definition of a noninformative prior on the quantum statistical model of pure states. While the full pure-states model is invariant under unitary rotation and admits the Haar measure, restricted models, which we often see in quantum channel estimation and quantum process tomography, have less symmetry and no compelling rationale for any choice. We adopt a game-theoretic approach that is applicable to classical Bayesian statistics and yields a noninformative prior for a general class of probability distributions. We define the quantum detection game and show that there exist noninformative priors for a general class of a pure-states model. Theoretically, it gives one of the ways that we represent ignorance on the given quantum system with partial information. Practically, our method proposes a default distribution on the model in order to use the Bayesian technique in the quantum-state tomography with a small sample.

  6. Statistically Modeling I-V Characteristics of CNT-FET with LASSO

    NASA Astrophysics Data System (ADS)

    Ma, Dongsheng; Ye, Zuochang; Wang, Yan

    2017-08-01

    With the advent of internet of things (IOT), the need for studying new material and devices for various applications is increasing. Traditionally we build compact models for transistors on the basis of physics. But physical models are expensive and need a very long time to adjust for non-ideal effects. As the vision for the application of many novel devices is not certain or the manufacture process is not mature, deriving generalized accurate physical models for such devices is very strenuous, whereas statistical modeling is becoming a potential method because of its data oriented property and fast implementation. In this paper, one classical statistical regression method, LASSO, is used to model the I-V characteristics of CNT-FET and a pseudo-PMOS inverter simulation based on the trained model is implemented in Cadence. The normalized relative mean square prediction error of the trained model versus experiment sample data and the simulation results show that the model is acceptable for digital circuit static simulation. And such modeling methodology can extend to general devices.

  7. Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

    NASA Astrophysics Data System (ADS)

    Al Sobhi, Mashail M.

    2015-02-01

    Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.

  8. Assessment of the scale effect on statistical downscaling quality at a station scale using a weather generator-based model

    USDA-ARS?s Scientific Manuscript database

    The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...

  9. Statistical label fusion with hierarchical performance models

    PubMed Central

    Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.

    2014-01-01

    Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809

  10. Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

    PubMed

    Dorazio, Robert M; Hunter, Margaret E

    2015-11-03

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  11. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  12. A Model Fit Statistic for Generalized Partial Credit Model

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.

    2009-01-01

    Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic…

  13. Assessment of corneal properties based on statistical modeling of OCT speckle

    PubMed Central

    Jesus, Danilo A.; Iskander, D. Robert

    2016-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409

  14. Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

    USGS Publications Warehouse

    Dorazio, Robert; Hunter, Margaret

    2015-01-01

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  15. Statistical Mechanics of Node-perturbation Learning with Noisy Baseline

    NASA Astrophysics Data System (ADS)

    Hara, Kazuyuki; Katahira, Kentaro; Okada, Masato

    2017-02-01

    Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is called a baseline. Cho et al. proposed node-perturbation learning with a noisy baseline. In this paper, we report on building the statistical mechanics of Cho's model and on deriving coupled differential equations of order parameters that depict learning dynamics. We also show how to derive the generalization error by solving the differential equations of order parameters. On the basis of the results, we show that Cho's results are also apply in general cases and show some general performances of Cho's model.

  16. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  17. Forecasting volatility with neural regression: a contribution to model adequacy.

    PubMed

    Refenes, A N; Holt, W T

    2001-01-01

    Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.

  18. Climate Change Implications for Tropical Islands: Interpolating and Interpreting Statistically Downscaled GCM Projections for Management and Planning

    Treesearch

    Azad Henareh Khalyani; William A. Gould; Eric Harmsen; Adam Terando; Maya Quinones; Jaime A. Collazo

    2016-01-01

  19. Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2017-01-01

    In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…

  20. The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

    PubMed

    Tendeiro, Jorge N

    2017-01-01

    Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.

  1. Comparison of statistical models for writer verification

    NASA Astrophysics Data System (ADS)

    Srihari, Sargur; Ball, Gregory R.

    2009-01-01

    A novel statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The goal of this formulation is to learn the specific uniqueness of style in a particular author's writing, given the known document. Since there are often insufficient samples to extrapolate a generalized model of an writer's handwriting based solely on the document, we instead generalize over the differences between the author and a large population of known different writers. This is in contrast to an earlier model proposed whereby probability distributions were a priori without learning. We show the performance of the model along with a comparison in performance to the non-learning, older model, which shows significant improvement.

  2. No-reference image quality assessment based on natural scene statistics and gradient magnitude similarity

    NASA Astrophysics Data System (ADS)

    Jia, Huizhen; Sun, Quansen; Ji, Zexuan; Wang, Tonghan; Chen, Qiang

    2014-11-01

    The goal of no-reference/blind image quality assessment (NR-IQA) is to devise a perceptual model that can accurately predict the quality of a distorted image as human opinions, in which feature extraction is an important issue. However, the features used in the state-of-the-art "general purpose" NR-IQA algorithms are usually natural scene statistics (NSS) based or are perceptually relevant; therefore, the performance of these models is limited. To further improve the performance of NR-IQA, we propose a general purpose NR-IQA algorithm which combines NSS-based features with perceptually relevant features. The new method extracts features in both the spatial and gradient domains. In the spatial domain, we extract the point-wise statistics for single pixel values which are characterized by a generalized Gaussian distribution model to form the underlying features. In the gradient domain, statistical features based on neighboring gradient magnitude similarity are extracted. Then a mapping is learned to predict quality scores using a support vector regression. The experimental results on the benchmark image databases demonstrate that the proposed algorithm correlates highly with human judgments of quality and leads to significant performance improvements over state-of-the-art methods.

  3. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  4. Statistical thermodynamics of a two-dimensional relativistic gas.

    PubMed

    Montakhab, Afshin; Ghodrat, Malihe; Barati, Mahmood

    2009-03-01

    In this paper we study a fully relativistic model of a two-dimensional hard-disk gas. This model avoids the general problems associated with relativistic particle collisions and is therefore an ideal system to study relativistic effects in statistical thermodynamics. We study this model using molecular-dynamics simulation, concentrating on the velocity distribution functions. We obtain results for x and y components of velocity in the rest frame (Gamma) as well as the moving frame (Gamma;{'}) . Our results confirm that Jüttner distribution is the correct generalization of Maxwell-Boltzmann distribution. We obtain the same "temperature" parameter beta for both frames consistent with a recent study of a limited one-dimensional model. We also address the controversial topic of temperature transformation. We show that while local thermal equilibrium holds in the moving frame, relying on statistical methods such as distribution functions or equipartition theorem are ultimately inconclusive in deciding on a correct temperature transformation law (if any).

  5. A note about high blood pressure in childhood

    NASA Astrophysics Data System (ADS)

    Teodoro, M. Filomena; Simão, Carla

    2017-06-01

    In medical, behavioral and social sciences it is usual to get a binary outcome. In the present work is collected information where some of the outcomes are binary variables (1='yes'/ 0='no'). In [14] a preliminary study about the caregivers perception of pediatric hypertension was introduced. An experimental questionnaire was designed to be answered by the caregivers of routine pediatric consultation attendees in the Santa Maria's hospital (HSM). The collected data was statistically analyzed, where a descriptive analysis and a predictive model were performed. Significant relations between some socio-demographic variables and the assessed knowledge were obtained. In [14] can be found a statistical data analysis using partial questionnaire's information. The present article completes the statistical approach estimating a model for relevant remaining questions of questionnaire by Generalized Linear Models (GLM). Exploring the binary outcome issue, we intend to extend this approach using Generalized Linear Mixed Models (GLMM), but the process is still ongoing.

  6. Probability, statistics, and computational science.

    PubMed

    Beerenwinkel, Niko; Siebourg, Juliane

    2012-01-01

    In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

  7. Equilibrium statistical-thermal models in high-energy physics

    NASA Astrophysics Data System (ADS)

    Tawfik, Abdel Nasser

    2014-05-01

    We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.

  8. The Effects of Measurement Error on Statistical Models for Analyzing Change. Final Report.

    ERIC Educational Resources Information Center

    Dunivant, Noel

    The results of six major projects are discussed including a comprehensive mathematical and statistical analysis of the problems caused by errors of measurement in linear models for assessing change. In a general matrix representation of the problem, several new analytic results are proved concerning the parameters which affect bias in…

  9. A generalized Benford's law for JPEG coefficients and its applications in image forensics

    NASA Astrophysics Data System (ADS)

    Fu, Dongdong; Shi, Yun Q.; Su, Wei

    2007-02-01

    In this paper, a novel statistical model based on Benford's law for the probability distributions of the first digits of the block-DCT and quantized JPEG coefficients is presented. A parametric logarithmic law, i.e., the generalized Benford's law, is formulated. Furthermore, some potential applications of this model in image forensics are discussed in this paper, which include the detection of JPEG compression for images in bitmap format, the estimation of JPEG compression Qfactor for JPEG compressed bitmap image, and the detection of double compressed JPEG image. The results of our extensive experiments demonstrate the effectiveness of the proposed statistical model.

  10. Invariance in the recurrence of large returns and the validation of models of price dynamics

    NASA Astrophysics Data System (ADS)

    Chang, Lo-Bin; Geman, Stuart; Hsieh, Fushing; Hwang, Chii-Ruey

    2013-08-01

    Starting from a robust, nonparametric definition of large returns (“excursions”), we study the statistics of their occurrences, focusing on the recurrence process. The empirical waiting-time distribution between excursions is remarkably invariant to year, stock, and scale (return interval). This invariance is related to self-similarity of the marginal distributions of returns, but the excursion waiting-time distribution is a function of the entire return process and not just its univariate probabilities. Generalized autoregressive conditional heteroskedasticity (GARCH) models, market-time transformations based on volume or trades, and generalized (Lévy) random-walk models all fail to fit the statistical structure of excursions.

  11. Development of Composite Materials with High Passive Damping Properties

    DTIC Science & Technology

    2006-05-15

    frequency response function analysis. Sound transmission through sandwich panels was studied using the statistical energy analysis (SEA). Modal density...2.2.3 Finite element models 14 2.2.4 Statistical energy analysis method 15 CHAPTER 3 ANALYSIS OF DAMPING IN SANDWICH MATERIALS. 24 3.1 Equation of...sheets and the core. 2.2.4 Statistical energy analysis method Finite element models are generally only efficient for problems at low and middle frequencies

  12. Climate change and water resources in a tropical island system: propagation of uncertainty from statistically downscaled climate models to hydrologic models

    Treesearch

    Ashley E. Van Beusekom; William A. Gould; Adam J. Terando; Jaime A. Collazo

    2015-01-01

    Many tropical islands have limited water resources with historically increasing demand, all potentially affected by a changing climate. The effects of climate change on island hydrology are difficult to model due to steep local precipitation gradients and sparse data. Thiswork uses 10 statistically downscaled general circulationmodels (GCMs) under two greenhouse gas...

  13. A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

    NASA Astrophysics Data System (ADS)

    Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

    1990-10-01

    Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.

  14. Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

    NASA Astrophysics Data System (ADS)

    Eroglu, Sertac

    2014-10-01

    The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.

  15. Quantifying and Generalizing Hydrologic Responses to Dam Regulation using a Statistical Modeling Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McManamay, Ryan A

    2014-01-01

    Despite the ubiquitous existence of dams within riverscapes, much of our knowledge about dams and their environmental effects remains context-specific. Hydrology, more than any other environmental variable, has been studied in great detail with regard to dam regulation. While much progress has been made in generalizing the hydrologic effects of regulation by large dams, many aspects of hydrology show site-specific fidelity to dam operations, small dams (including diversions), and regional hydrologic regimes. A statistical modeling framework is presented to quantify and generalize hydrologic responses to varying degrees of dam regulation. Specifically, the objectives were to 1) compare the effects ofmore » local versus cumulative dam regulation, 2) determine the importance of different regional hydrologic regimes in influencing hydrologic responses to dams, and 3) evaluate how different regulation contexts lead to error in predicting hydrologic responses to dams. Overall, model performance was poor in quantifying the magnitude of hydrologic responses, but performance was sufficient in classifying hydrologic responses as negative or positive. Responses of some hydrologic indices to dam regulation were highly dependent upon hydrologic class membership and the purpose of the dam. The opposing coefficients between local and cumulative-dam predictors suggested that hydrologic responses to cumulative dam regulation are complex, and predicting the hydrology downstream of individual dams, as opposed to multiple dams, may be more easy accomplished using statistical approaches. Results also suggested that particular contexts, including multipurpose dams, high cumulative regulation by multiple dams, diversions, close proximity to dams, and certain hydrologic classes are all sources of increased error when predicting hydrologic responses to dams. Statistical models, such as the ones presented herein, show promise in their ability to model the effects of dam regulation effects at large spatial scales as to generalize the directionality of hydrologic responses.« less

  16. Statistics of Smoothed Cosmic Fields in Perturbation Theory. I. Formulation and Useful Formulae in Second-Order Perturbation Theory

    NASA Astrophysics Data System (ADS)

    Matsubara, Takahiko

    2003-02-01

    We formulate a general method for perturbative evaluations of statistics of smoothed cosmic fields and provide useful formulae for application of the perturbation theory to various statistics. This formalism is an extensive generalization of the method used by Matsubara, who derived a weakly nonlinear formula of the genus statistic in a three-dimensional density field. After describing the general method, we apply the formalism to a series of statistics, including genus statistics, level-crossing statistics, Minkowski functionals, and a density extrema statistic, regardless of the dimensions in which each statistic is defined. The relation between the Minkowski functionals and other geometrical statistics is clarified. These statistics can be applied to several cosmic fields, including three-dimensional density field, three-dimensional velocity field, two-dimensional projected density field, and so forth. The results are detailed for second-order theory of the formalism. The effect of the bias is discussed. The statistics of smoothed cosmic fields as functions of rescaled threshold by volume fraction are discussed in the framework of second-order perturbation theory. In CDM-like models, their functional deviations from linear predictions plotted against the rescaled threshold are generally much smaller than that plotted against the direct threshold. There is still a slight meatball shift against rescaled threshold, which is characterized by asymmetry in depths of troughs in the genus curve. A theory-motivated asymmetry factor in the genus curve is proposed.

  17. Analysis of the dependence of extreme rainfalls

    NASA Astrophysics Data System (ADS)

    Padoan, Simone; Ancey, Christophe; Parlange, Marc

    2010-05-01

    The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.

  18. A General Approach to Causal Mediation Analysis

    ERIC Educational Resources Information Center

    Imai, Kosuke; Keele, Luke; Tingley, Dustin

    2010-01-01

    Traditionally in the social sciences, causal mediation analysis has been formulated, understood, and implemented within the framework of linear structural equation models. We argue and demonstrate that this is problematic for 3 reasons: the lack of a general definition of causal mediation effects independent of a particular statistical model, the…

  19. Different Manhattan project: automatic statistical model generation

    NASA Astrophysics Data System (ADS)

    Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

    2002-03-01

    We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.

  20. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  1. On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages

    PubMed Central

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W.; Xia, Yinglin; Tu, Xin M.

    2011-01-01

    Summary The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. PMID:21671252

  2. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  3. Empirical comparison study of approximate methods for structure selection in binary graphical models.

    PubMed

    Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

    2014-03-01

    Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  5. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  6. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  7. Vortex dynamics and Lagrangian statistics in a model for active turbulence.

    PubMed

    James, Martin; Wilczek, Michael

    2018-02-14

    Cellular suspensions such as dense bacterial flows exhibit a turbulence-like phase under certain conditions. We study this phenomenon of "active turbulence" statistically by using numerical tools. Following Wensink et al. (Proc. Natl. Acad. Sci. U.S.A. 109, 14308 (2012)), we model active turbulence by means of a generalized Navier-Stokes equation. Two-point velocity statistics of active turbulence, both in the Eulerian and the Lagrangian frame, is explored. We characterize the scale-dependent features of two-point statistics in this system. Furthermore, we extend this statistical study with measurements of vortex dynamics in this system. Our observations suggest that the large-scale statistics of active turbulence is close to Gaussian with sub-Gaussian tails.

  8. Physics-based statistical learning approach to mesoscopic model selection.

    PubMed

    Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab

    2015-11-01

    In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.

  9. Towards a General Turbulence Model for Planetary Boundary Layers Based on Direct Statistical Simulation

    NASA Astrophysics Data System (ADS)

    Skitka, J.; Marston, B.; Fox-Kemper, B.

    2016-02-01

    Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.

  10. Supervised variational model with statistical inference and its application in medical image segmentation.

    PubMed

    Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David

    2015-01-01

    Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.

  11. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

    PubMed

    Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

    2017-09-01

    The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).

  12. Turbulent scaling laws as solutions of the multi-point correlation equation using statistical symmetries

    NASA Astrophysics Data System (ADS)

    Oberlack, Martin; Rosteck, Andreas; Avsarkisov, Victor

    2013-11-01

    Text-book knowledge proclaims that Lie symmetries such as Galilean transformation lie at the heart of fluid dynamics. These important properties also carry over to the statistical description of turbulence, i.e. to the Reynolds stress transport equations and its generalization, the multi-point correlation equations (MPCE). Interesting enough, the MPCE admit a much larger set of symmetries, in fact infinite dimensional, subsequently named statistical symmetries. Most important, theses new symmetries have important consequences for our understanding of turbulent scaling laws. The symmetries form the essential foundation to construct exact solutions to the infinite set of MPCE, which in turn are identified as classical and new turbulent scaling laws. Examples on various classical and new shear flow scaling laws including higher order moments will be presented. Even new scaling have been forecasted from these symmetries and in turn validated by DNS. Turbulence modellers have implicitly recognized at least one of the statistical symmetries as this is the basis for the usual log-law which has been employed for calibrating essentially all engineering turbulence models. An obvious conclusion is to generally make turbulence models consistent with the new statistical symmetries.

  13. Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population

    PubMed Central

    2013-01-01

    Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963

  14. A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

    PubMed

    Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

    2014-01-01

    We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hagos, Samson M.; Feng, Zhe; Burleyson, Casey D.

    Regional cloud permitting model simulations of cloud populations observed during the 2011 ARM Madden Julian Oscillation Investigation Experiment/ Dynamics of Madden-Julian Experiment (AMIE/DYNAMO) field campaign are evaluated against radar and ship-based measurements. Sensitivity of model simulated surface rain rate statistics to parameters and parameterization of hydrometeor sizes in five commonly used WRF microphysics schemes are examined. It is shown that at 2 km grid spacing, the model generally overestimates rain rate from large and deep convective cores. Sensitivity runs involving variation of parameters that affect rain drop or ice particle size distribution (more aggressive break-up process etc) generally reduce themore » bias in rain-rate and boundary layer temperature statistics as the smaller particles become more vulnerable to evaporation. Furthermore significant improvement in the convective rain-rate statistics is observed when the horizontal grid-spacing is reduced to 1 km and 0.5 km, while it is worsened when run at 4 km grid spacing as increased turbulence enhances evaporation. The results suggest modulation of evaporation processes, through parameterization of turbulent mixing and break-up of hydrometeors may provide a potential avenue for correcting cloud statistics and associated boundary layer temperature biases in regional and global cloud permitting model simulations.« less

  16. Best Statistical Distribution of flood variables for Johor River in Malaysia

    NASA Astrophysics Data System (ADS)

    Salarpour Goodarzi, M.; Yusop, Z.; Yusof, F.

    2012-12-01

    A complex flood event is always characterized by a few characteristics such as flood peak, flood volume, and flood duration, which might be mutually correlated. This study explored the statistical distribution of peakflow, flood duration and flood volume at Rantau Panjang gauging station on the Johor River in Malaysia. Hourly data were recorded for 45 years. The data were analysed based on water year (July - June). Five distributions namely, Log Normal, Generalize Pareto, Log Pearson, Normal and Generalize Extreme Value (GEV) were used to model the distribution of all the three variables. Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalize Pareto distribution is found to be the most suitable model when tested with the Anderson-Darling test and the, Kolmogorov-Smirnov suggested that GEV is the best for peakflow. The result of this research can be used to improve flood frequency analysis. Comparison between Generalized Extreme Value, Generalized Pareto and Log Pearson distributions in the Cumulative Distribution Function of peakflow

  17. Demographic Accounting and Model-Building. Education and Development Technical Reports.

    ERIC Educational Resources Information Center

    Stone, Richard

    This report describes and develops a model for coordinating a variety of demographic and social statistics within a single framework. The framework proposed, together with its associated methods of analysis, serves both general and specific functions. The general aim of these functions is to give numerical definition to the pattern of society and…

  18. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  19. A statistical shape model of the human second cervical vertebra.

    PubMed

    Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon

    2015-07-01

    Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.

  20. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  1. Alignment-free sequence comparison (II): theoretical power of comparison statistics.

    PubMed

    Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S

    2010-11-01

    Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be downloaded from http://meta.cmb.usc.edu/d2. Supplementary Material is available at www.liebertonline.com/cmb.

  2. Right-Sizing Statistical Models for Longitudinal Data

    PubMed Central

    Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.

    2015-01-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507

  3. Right-sizing statistical models for longitudinal data.

    PubMed

    Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

    2015-12-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).

  4. Strongly magnetized classical plasma models

    NASA Technical Reports Server (NTRS)

    Montgomery, D. C.

    1972-01-01

    The class of plasma processes for which the so-called Vlasov approximation is inadequate is investigated. Results from the equilibrium statistical mechanics of two-dimensional plasmas are derived. These results are independent of the presence of an external dc magnetic field. The nonequilibrium statistical mechanics of the electrostatic guiding-center plasma, a two-dimensional plasma model, is discussed. This model is then generalized to three dimensions. The guiding-center model is relaxed to include finite Larmor radius effects for a two-dimensional plasma.

  5. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations.

    PubMed

    Li, Qizhai; Hu, Jiyuan; Ding, Juan; Zheng, Gang

    2014-04-01

    A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.

  6. Generalized two-dimensional chiral QED: Anomaly and exotic statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saradzhev, F.M.

    1997-07-01

    We study the influence of the anomaly on the physical quantum picture of the generalized chiral Schwinger model defined on S{sup 1}. We show that the anomaly (i) results in the background linearly rising electric field and (ii) makes the spectrum of the physical Hamiltonian nonrelativistic without a massive boson. The physical matter fields acquire exotic statistics. We construct explicitly the algebra of the Poincar{acute e} generators and show that it differs from the Poincar{acute e} one. We exhibit the role of the vacuum Berry phase in the failure of the Poincar{acute e} algebra to close. We prove that, inmore » spite of the background electric field, such phenomenon as the total screening of external charges characteristic for the standard Schwinger model takes place in the generalized chiral Schwinger model, too. {copyright} {ital 1997} {ital The American Physical Society}« less

  7. On fitting generalized linear mixed-effects models for binary responses using different statistical packages.

    PubMed

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W; Xia, Yinglin; Zhu, Liang; Tu, Xin M

    2011-09-10

    The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. Copyright © 2011 John Wiley & Sons, Ltd.

  8. A Guide to the Literature on Learning Graphical Models

    NASA Technical Reports Server (NTRS)

    Buntine, Wray L.; Friedland, Peter (Technical Monitor)

    1994-01-01

    This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and more generally, learning probabilistic graphical models. Because many problems in artificial intelligence, statistics and neural networks can be represented as a probabilistic graphical model, this area provides a unifying perspective on learning. This paper organizes the research in this area along methodological lines of increasing complexity.

  9. Performance of the Generalized S-X[Superscript 2] Item Fit Index for Polytomous IRT Models

    ERIC Educational Resources Information Center

    Kang, Taehoon; Chen, Troy T.

    2008-01-01

    Orlando and Thissen's S-X[superscript 2] item fit index has performed better than traditional item fit statistics such as Yen' s Q[subscript 1] and McKinley and Mill' s G[superscript 2] for dichotomous item response theory (IRT) models. This study extends the utility of S-X[superscript 2] to polytomous IRT models, including the generalized partial…

  10. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    PubMed

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  12. Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy

    NASA Astrophysics Data System (ADS)

    Gil, D.; Garcia-Barnes, J.; Hernández-Sabate, A.; Marti, E.

    2010-03-01

    Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.

  13. A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis.

    PubMed

    Gonzalez, Oscar; MacKinnon, David P

    Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to an outcome. However, current methods do not allow researchers to study the relationships between general and specific aspects of a construct to an outcome simultaneously. This study proposes a bifactor measurement model for the mediating construct as a way to parse variance and represent the general aspect and specific facets of a construct simultaneously. Monte Carlo simulation results are presented to help determine the properties of mediated effect estimation when the mediator has a bifactor structure and a specific facet of a construct is the true mediator. This study also investigates the conditions when researchers can detect the mediated effect when the multidimensionality of the mediator is ignored and treated as unidimensional. Simulation results indicated that the mediation model with a bifactor mediator measurement model had unbiased and adequate power to detect the mediated effect with a sample size greater than 500 and medium a - and b -paths. Also, results indicate that parameter bias and detection of the mediated effect in both the data-generating model and the misspecified model varies as a function of the amount of facet variance represented in the mediation model. This study contributes to the largely unexplored area of measurement issues in statistical mediation analysis.

  14. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  15. Statistics of the geomagnetic secular variation for the past 5Ma

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1986-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  16. Statistics of the geomagnetic secular variation for the past 5 m.y

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1988-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  17. Parent Ratings of ADHD Symptoms: Generalized Partial Credit Model Analysis of Differential Item Functioning across Gender

    ERIC Educational Resources Information Center

    Gomez, Rapson

    2012-01-01

    Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…

  18. Aeronautical Engineering. A Continuing Bibliography with Indexes

    DTIC Science & Technology

    1987-09-01

    engines 482 01 AERONAUTICS (GENERAL) i-10 aircraft equipped with turbine engine ...rate adaptive control with applications to lateral Statistics on aircraft gas turbine engine rotor failures Unified model for the calculation of blade ...PUMPS p 527 A87-35669 to test data for a composite prop-tan model Gas turbine combustor and engine augmentor tube GENERAL AVIATION AIRCRAFT

  19. A new statistical method for transfer coefficient calculations in the framework of the general multiple-compartment model of transport for radionuclides in biological systems.

    PubMed

    Garcia, F; Arruda-Neto, J D; Manso, M V; Helene, O M; Vanin, V R; Rodriguez, O; Mesa, J; Likhachev, V P; Filho, J W; Deppman, A; Perez, G; Guzman, F; de Camargo, S P

    1999-10-01

    A new and simple statistical procedure (STATFLUX) for the calculation of transfer coefficients of radionuclide transport to animals and plants is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. By using experimentally available curves of radionuclide concentrations versus time, for each animal compartment (organs), flow parameters were estimated by employing a least-squares procedure, whose consistency is tested. Some numerical results are presented in order to compare the STATFLUX transfer coefficients with those from other works and experimental data.

  20. New Probe of Departures from General Relativity Using Minkowski Functionals.

    PubMed

    Fang, Wenjuan; Li, Baojiu; Zhao, Gong-Bo

    2017-05-05

    The morphological properties of the large scale structure of the Universe can be fully described by four Minkowski functionals (MFs), which provide important complementary information to other statistical observables such as the widely used 2-point statistics in configuration and Fourier spaces. In this work, for the first time, we present the differences in the morphology of the large scale structure caused by modifications to general relativity (to address the cosmic acceleration problem), by measuring the MFs from N-body simulations of modified gravity and general relativity. We find strong statistical power when using the MFs to constrain modified theories of gravity: with a galaxy survey that has survey volume ∼0.125(h^{-1}  Gpc)^{3} and galaxy number density ∼1/(h^{-1}  Mpc)^{3}, the two normal-branch Dvali-Gabadadze-Porrati models and the F5 f(R) model that we simulated can be discriminated from the ΛCDM model at a significance level ≳5σ with an individual MF measurement. Therefore, the MF of the large scale structure is potentially a powerful probe of gravity, and its application to real data deserves active exploration.

  1. A general science-based framework for dynamical spatio-temporal models

    USGS Publications Warehouse

    Wikle, C.K.; Hooten, M.B.

    2010-01-01

    Spatio-temporal statistical models are increasingly being used across a wide variety of scientific disciplines to describe and predict spatially-explicit processes that evolve over time. Correspondingly, in recent years there has been a significant amount of research on new statistical methodology for such models. Although descriptive models that approach the problem from the second-order (covariance) perspective are important, and innovative work is being done in this regard, many real-world processes are dynamic, and it can be more efficient in some cases to characterize the associated spatio-temporal dependence by the use of dynamical models. The chief challenge with the specification of such dynamical models has been related to the curse of dimensionality. Even in fairly simple linear, first-order Markovian, Gaussian error settings, statistical models are often over parameterized. Hierarchical models have proven invaluable in their ability to deal to some extent with this issue by allowing dependency among groups of parameters. In addition, this framework has allowed for the specification of science based parameterizations (and associated prior distributions) in which classes of deterministic dynamical models (e. g., partial differential equations (PDEs), integro-difference equations (IDEs), matrix models, and agent-based models) are used to guide specific parameterizations. Most of the focus for the application of such models in statistics has been in the linear case. The problems mentioned above with linear dynamic models are compounded in the case of nonlinear models. In this sense, the need for coherent and sensible model parameterizations is not only helpful, it is essential. Here, we present an overview of a framework for incorporating scientific information to motivate dynamical spatio-temporal models. First, we illustrate the methodology with the linear case. We then develop a general nonlinear spatio-temporal framework that we call general quadratic nonlinearity and demonstrate that it accommodates many different classes of scientific-based parameterizations as special cases. The model is presented in a hierarchical Bayesian framework and is illustrated with examples from ecology and oceanography. ?? 2010 Sociedad de Estad??stica e Investigaci??n Operativa.

  2. Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

    DTIC Science & Technology

    2010-06-01

    Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler

  3. Predicting adsorptive removal of chlorophenol from aqueous solution using artificial intelligence based modeling approaches.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali

    2013-04-01

    The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.

  4. The epistemological status of general circulation models

    NASA Astrophysics Data System (ADS)

    Loehle, Craig

    2018-03-01

    Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.

  5. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  6. Statistical physics of the symmetric group.

    PubMed

    Williams, Mobolaji

    2017-04-01

    Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.

  7. Statistical physics of the symmetric group

    NASA Astrophysics Data System (ADS)

    Williams, Mobolaji

    2017-04-01

    Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.

  8. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  9. General solution of the chemical master equation and modality of marginal distributions for hierarchic first-order reaction networks.

    PubMed

    Reis, Matthias; Kromer, Justus A; Klipp, Edda

    2018-01-20

    Multimodality is a phenomenon which complicates the analysis of statistical data based exclusively on mean and variance. Here, we present criteria for multimodality in hierarchic first-order reaction networks, consisting of catalytic and splitting reactions. Those networks are characterized by independent and dependent subnetworks. First, we prove the general solvability of the Chemical Master Equation (CME) for this type of reaction network and thereby extend the class of solvable CME's. Our general solution is analytical in the sense that it allows for a detailed analysis of its statistical properties. Given Poisson/deterministic initial conditions, we then prove the independent species to be Poisson/binomially distributed, while the dependent species exhibit generalized Poisson/Khatri Type B distributions. Generalized Poisson/Khatri Type B distributions are multimodal for an appropriate choice of parameters. We illustrate our criteria for multimodality by several basic models, as well as the well-known two-stage transcription-translation network and Bateman's model from nuclear physics. For both examples, multimodality was previously not reported.

  10. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    NASA Astrophysics Data System (ADS)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.

  11. Novel formulation of the ℳ model through the Generalized-K distribution for atmospheric optical channels.

    PubMed

    Garrido-Balsells, José María; Jurado-Navas, Antonio; Paris, José Francisco; Castillo-Vazquez, Miguel; Puerta-Notario, Antonio

    2015-03-09

    In this paper, a novel and deeper physical interpretation on the recently published Málaga or ℳ statistical distribution is provided. This distribution, which is having a wide acceptance by the scientific community, models the optical irradiance scintillation induced by the atmospheric turbulence. Here, the analytical expressions previously published are modified in order to express them by a mixture of the known Generalized-K and discrete Binomial and Negative Binomial distributions. In particular, the probability density function (pdf) of the ℳ model is now obtained as a linear combination of these Generalized-K pdf, in which the coefficients depend directly on the parameters of the ℳ distribution. In this way, the Málaga model can be physically interpreted as a superposition of different optical sub-channels each of them described by the corresponding Generalized-K fading model and weighted by the ℳ dependent coefficients. The expressions here proposed are simpler than the equations of the original ℳ model and are validated by means of numerical simulations by generating ℳ -distributed random sequences and their associated histogram. This novel interpretation of the Málaga statistical distribution provides a valuable tool for analyzing the performance of atmospheric optical channels for every turbulence condition.

  12. Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy.

    PubMed

    Huppert, Theodore J

    2016-01-01

    Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.

  13. A full year evaluation of the CALIOPE-EU air quality modeling system over Europe for 2004

    NASA Astrophysics Data System (ADS)

    Pay, M. T.; Piot, M.; Jorba, O.; Gassó, S.; Gonçalves, M.; Basart, S.; Dabdub, D.; Jiménez-Guerrero, P.; Baldasano, J. M.

    The CALIOPE-EU high-resolution air quality modeling system, namely WRF-ARW/HERMES-EMEP/CMAQ/BSC-DREAM8b, is developed and applied to Europe (12 km × 12 km, 1 h). The model performances are tested in terms of air quality levels and dynamics reproducibility on a yearly basis. The present work describes a quantitative evaluation of gas phase species (O 3, NO 2 and SO 2) and particulate matter (PM2.5 and PM10) against ground-based measurements from the EMEP (European Monitoring and Evaluation Programme) network for the year 2004. The evaluation is based on statistics. Simulated O 3 achieves satisfactory performances for both daily mean and daily maximum concentrations, especially in summer, with annual mean correlations of 0.66 and 0.69, respectively. Mean normalized errors are comprised within the recommendations proposed by the United States Environmental Protection Agency (US-EPA). The general trends and daily variations of primary pollutants (NO 2 and SO 2) are satisfactory. Daily mean concentrations of NO 2 correlate well with observations (annual correlation r = 0.67) but tend to be underestimated. For SO 2, mean concentrations are well simulated (mean bias = 0.5 μg m -3) with relatively high annual mean correlation ( r = 0.60), although peaks are generally overestimated. The dynamics of PM2.5 and PM10 is well reproduced (0.49 < r < 0.62), but mean concentrations remain systematically underestimated. Deficiencies in particulate matter source characterization are discussed. Also, the spatially distributed statistics and the general patterns for each pollutant over Europe are examined. The model performances are compared with other European studies. While O 3 statistics generally remain lower than those obtained by the other considered studies, statistics for NO 2, SO 2, PM2.5 and PM10 present higher scores than most models.

  14. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  15. Stochastic Analysis and Probabilistic Downscaling of Soil Moisture

    NASA Astrophysics Data System (ADS)

    Deshon, J. P.; Niemann, J. D.; Green, T. R.; Jones, A. S.

    2017-12-01

    Soil moisture is a key variable for rainfall-runoff response estimation, ecological and biogeochemical flux estimation, and biodiversity characterization, each of which is useful for watershed condition assessment. These applications require not only accurate, fine-resolution soil-moisture estimates but also confidence limits on those estimates and soil-moisture patterns that exhibit realistic statistical properties (e.g., variance and spatial correlation structure). The Equilibrium Moisture from Topography, Vegetation, and Soil (EMT+VS) model downscales coarse-resolution (9-40 km) soil moisture from satellite remote sensing or land-surface models to produce fine-resolution (10-30 m) estimates. The model was designed to produce accurate deterministic soil-moisture estimates at multiple points, but the resulting patterns do not reproduce the variance or spatial correlation of observed soil-moisture patterns. The primary objective of this research is to generalize the EMT+VS model to produce a probability density function (pdf) for soil moisture at each fine-resolution location and time. Each pdf has a mean that is equal to the deterministic soil-moisture estimate, and the pdf can be used to quantify the uncertainty in the soil-moisture estimates and to simulate soil-moisture patterns. Different versions of the generalized model are hypothesized based on how uncertainty enters the model, whether the uncertainty is additive or multiplicative, and which distributions describe the uncertainty. These versions are then tested by application to four catchments with detailed soil-moisture observations (Tarrawarra, Satellite Station, Cache la Poudre, and Nerrigundah). The performance of the generalized models is evaluated by comparing the statistical properties of the simulated soil-moisture patterns to those of the observations and the deterministic EMT+VS model. The versions of the generalized EMT+VS model with normally distributed stochastic components produce soil-moisture patterns with more realistic statistical properties than the deterministic model. Additionally, the results suggest that the variance and spatial correlation of the stochastic soil-moisture variations do not vary consistently with the spatial-average soil moisture.

  16. Noise limitations in optical linear algebra processors.

    PubMed

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  17. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

  18. Generalized Gibbs ensemble in integrable lattice models

    NASA Astrophysics Data System (ADS)

    Vidmar, Lev; Rigol, Marcos

    2016-06-01

    The generalized Gibbs ensemble (GGE) was introduced ten years ago to describe observables in isolated integrable quantum systems after equilibration. Since then, the GGE has been demonstrated to be a powerful tool to predict the outcome of the relaxation dynamics of few-body observables in a variety of integrable models, a process we call generalized thermalization. This review discusses several fundamental aspects of the GGE and generalized thermalization in integrable systems. In particular, we focus on questions such as: which observables equilibrate to the GGE predictions and who should play the role of the bath; what conserved quantities can be used to construct the GGE; what are the differences between generalized thermalization in noninteracting systems and in interacting systems mappable to noninteracting ones; why is it that the GGE works when traditional ensembles of statistical mechanics fail. Despite a lot of interest in these questions in recent years, no definite answers have been given. We review results for the XX model and for the transverse field Ising model. For the latter model, we also report original results and show that the GGE describes spin-spin correlations over the entire system. This makes apparent that there is no need to trace out a part of the system in real space for equilibration to occur and for the GGE to apply. In the past, a spectral decomposition of the weights of various statistical ensembles revealed that generalized eigenstate thermalization occurs in the XX model (hard-core bosons). Namely, eigenstates of the Hamiltonian with similar distributions of conserved quantities have similar expectation values of few-spin observables. Here we show that generalized eigenstate thermalization also occurs in the transverse field Ising model.

  19. A Constrained Linear Estimator for Multiple Regression

    ERIC Educational Resources Information Center

    Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

    2010-01-01

    "Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…

  20. Comparing physically-based and statistical landslide susceptibility model outputs - a case study from Lower Austria

    NASA Astrophysics Data System (ADS)

    Canli, Ekrem; Thiebes, Benni; Petschko, Helene; Glade, Thomas

    2015-04-01

    By now there is a broad consensus that due to human-induced global change the frequency and magnitude of heavy precipitation events is expected to increase in certain parts of the world. Given the fact, that rainfall serves as the most common triggering agent for landslide initiation, also an increased landside activity can be expected there. Landslide occurrence is a globally spread phenomenon that clearly needs to be handled. The present and well known problems in modelling landslide susceptibility and hazard give uncertain results in the prediction. This includes the lack of a universal applicable modelling solution for adequately assessing landslide susceptibility (which can be seen as the relative indication of the spatial probability of landslide initiation). Generally speaking, there are three major approaches for performing landslide susceptibility analysis: heuristic, statistical and deterministic models, all with different assumptions, its distinctive data requirements and differently interpretable outcomes. Still, detailed comparison of resulting landslide susceptibility maps are rare. In this presentation, the susceptibility modelling outputs of a deterministic model (Stability INdex MAPping - SINMAP) and a statistical modelling approach (generalized additive model - GAM) are compared. SINMAP is an infinite slope stability model which requires parameterization of soil mechanical parameters. Modelling with the generalized additive model, which represents a non-linear extension of a generalized linear model, requires a high quality landslide inventory that serves as the dependent variable in the statistical approach. Both methods rely on topographical data derived from the DTM. The comparison has been carried out in a study area located in the district of Waidhofen/Ybbs in Lower Austria. For the whole district (ca. 132 km²), 1063 landslides have been mapped and partially used within the analysis and the validation of the model outputs. The respective susceptibility maps have been reclassified to contain three susceptibility classes each. The comparison of the susceptibility maps was performed on a grid cell basis. A match of the maps was observed for grid cells located in the same susceptibility class. In contrast, a mismatch or deviation was observed for locations with different assigned susceptibility classes (up to two classes' difference). Although the modelling approaches differ significantly, more than 70% of the pixels reveal a match in the same susceptibility class. A mismatch by two classes' difference occurred in less than 2% of all pixels. Although the result looks promising and strengthens the confidence in the susceptibility zonation for this area, some of the general drawbacks related to the respective approaches still have to be addressed in further detail. Future work is heading towards an integration of probabilistic aspects into deterministic modelling.

  1. Tropical geometry of statistical models.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.

  2. Detection of crossover time scales in multifractal detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Ge, Erjia; Leung, Yee

    2013-04-01

    Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.

  3. Projecting Range Limits with Coupled Thermal Tolerance - Climate Change Models: An Example Based on Gray Snapper (Lutjanus griseus) along the U.S. East Coast

    PubMed Central

    Hare, Jonathan A.; Wuenschel, Mark J.; Kimball, Matthew E.

    2012-01-01

    We couple a species range limit hypothesis with the output of an ensemble of general circulation models to project the poleward range limit of gray snapper. Using laboratory-derived thermal limits and statistical downscaling from IPCC AR4 general circulation models, we project that gray snapper will shift northwards; the magnitude of this shift is dependent on the magnitude of climate change. We also evaluate the uncertainty in our projection and find that statistical uncertainty associated with the experimentally-derived thermal limits is the largest contributor (∼ 65%) to overall quantified uncertainty. This finding argues for more experimental work aimed at understanding and parameterizing the effects of climate change and variability on marine species. PMID:23284974

  4. Implementation and Research on the Operational Use of the Mesoscale Prediction Model COAMPS in Poland

    DTIC Science & Technology

    2007-09-30

    COAMPS model. Bogumil Jakubiak, University of Warsaw – participated in EGU General Assembly , Vienna Austria 15-20 April 2007 giving one oral and two...conditional forecast (background) error probability density function using an ensemble of the model forecast to generate background error statistics...COAMPS system on ICM machines at Warsaw University for the purpose of providing operational support to the general public using the ICM meteorological

  5. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  6. Statistical downscaling of precipitation using long short-term memory recurrent neural networks

    NASA Astrophysics Data System (ADS)

    Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra

    2017-11-01

    Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.

  7. Exploring a Three-Level Model of Calibration Accuracy

    ERIC Educational Resources Information Center

    Schraw, Gregory; Kuch, Fred; Gutierrez, Antonio P.; Richmond, Aaron S.

    2014-01-01

    We compared 5 different statistics (i.e., G index, gamma, "d'", sensitivity, specificity) used in the social sciences and medical diagnosis literatures to assess calibration accuracy in order to examine the relationship among them and to explore whether one statistic provided a best fitting general measure of accuracy. College…

  8. A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

    ERIC Educational Resources Information Center

    Gonzalez, Oscar; MacKinnon, David P.

    2018-01-01

    Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…

  9. Identifiability of PBPK Models with Applications to ...

    EPA Pesticide Factsheets

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy

  10. Stochastic modeling of sunshine number data

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Paulescu, Marius; Badescu, Viorel

    2013-11-01

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.

  11. Stochastic modeling of sunshine number data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

    2013-11-13

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less

  12. Generalized Appended Product Indicator Procedure for Nonlinear Structural Equation Analysis.

    ERIC Educational Resources Information Center

    Wall, Melanie M.; Amemiya, Yasuo

    2001-01-01

    Considers the estimation of polynomial structural models and shows a limitation of an existing method. Introduces a new procedure, the generalized appended product indicator procedure, for nonlinear structural equation analysis. Addresses statistical issues associated with the procedure through simulation. (SLD)

  13. Generalized t-statistic for two-group classification.

    PubMed

    Komori, Osamu; Eguchi, Shinto; Copas, John B

    2015-06-01

    In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.

  14. Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

    PubMed

    Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

    2015-01-01

    Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.

  15. Learning coefficient of generalization error in Bayesian estimation and vandermonde matrix-type singularity.

    PubMed

    Aoyagi, Miki; Nagata, Kenji

    2012-06-01

    The term algebraic statistics arises from the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry (Sturmfels, 2009 ). The purpose of our study is to consider the generalization error and stochastic complexity in learning theory by using the log-canonical threshold in algebraic geometry. Such thresholds correspond to the main term of the generalization error in Bayesian estimation, which is called a learning coefficient (Watanabe, 2001a , 2001b ). The learning coefficient serves to measure the learning efficiencies in hierarchical learning models. In this letter, we consider learning coefficients for Vandermonde matrix-type singularities, by using a new approach: focusing on the generators of the ideal, which defines singularities. We give tight new bound values of learning coefficients for the Vandermonde matrix-type singularities and the explicit values with certain conditions. By applying our results, we can show the learning coefficients of three-layered neural networks and normal mixture models.

  16. Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

    PubMed Central

    Snijders, Tom A.B.; Steglich, Christian E.G.

    2014-01-01

    Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578

  17. Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

    PubMed

    Harrington, Peter de Boves

    2018-01-02

    Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

  18. Modeling statistics and kinetics of the natural aggregation structures and processes with the solution of generalized logistic equation

    NASA Astrophysics Data System (ADS)

    Maslov, Lev A.; Chebotarev, Vladimir I.

    2017-02-01

    The generalized logistic equation is proposed to model kinetics and statistics of natural processes such as earthquakes, forest fires, floods, landslides, and many others. This equation has the form dN(A)/dA = s dot (1-N(A)) dot N(A)q dot A-α, q>0q>0 and A>0A>0 is the size of an element of a structure, and α≥0. The equation contains two exponents α and q taking into account two important properties of elements of a system: their fractal geometry, and their ability to interact either to enhance or to damp the process of aggregation. The function N(A)N(A) can be understood as an approximation to the number of elements the size of which is less than AA. The function dN(A)/dAdN(A)/dA where N(A)N(A) is the general solution of this equation for q=1 is a product of an increasing bounded function and power-law function with stretched exponential cut-off. The relation with Tsallis non-extensive statistics is demonstrated by solving the generalized logistic equation for q>0q>0. In the case 01q>1 it models sub-additive structures. The Gutenberg-Richter (G-R) formula results from interpretation of empirical data as a straight line in the area of stretched exponent with small α. The solution is applied for modeling distribution of foreshocks and aftershocks in the regions of Napa Valley 2014, and Sumatra 2004 earthquakes fitting the observed data well, both qualitatively and quantitatively.

  19. Normality of raw data in general linear models: The most widespread myth in statistics

    USGS Publications Warehouse

    Kery, Marc; Hatfield, Jeff S.

    2003-01-01

    In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.

  20. Poisson-process generalization for the trading waiting-time distribution in a double-auction mechanism

    NASA Astrophysics Data System (ADS)

    Cincotti, Silvano; Ponta, Linda; Raberto, Marco; Scalas, Enrico

    2005-05-01

    In this paper, empirical analyses and computational experiments are presented on high-frequency data for a double-auction (book) market. Main objective of the paper is to generalize the order waiting time process in order to properly model such empirical evidences. The empirical study is performed on the best bid and best ask data of 7 U.S. financial markets, for 30-stock time series. In particular, statistical properties of trading waiting times have been analyzed and quality of fits is evaluated by suitable statistical tests, i.e., comparing empirical distributions with theoretical models. Starting from the statistical studies on real data, attention has been focused on the reproducibility of such results in an artificial market. The computational experiments have been performed within the Genoa Artificial Stock Market. In the market model, heterogeneous agents trade one risky asset in exchange for cash. Agents have zero intelligence and issue random limit or market orders depending on their budget constraints. The price is cleared by means of a limit order book. The order generation is modelled with a renewal process. Based on empirical trading estimation, the distribution of waiting times between two consecutive orders is modelled by a mixture of exponential processes. Results show that the empirical waiting-time distribution can be considered as a generalization of a Poisson process. Moreover, the renewal process can approximate real data and implementation on the artificial stocks market can reproduce the trading activity in a realistic way.

  1. A κ-generalized statistical mechanics approach to income analysis

    NASA Astrophysics Data System (ADS)

    Clementi, F.; Gallegati, M.; Kaniadakis, G.

    2009-02-01

    This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.

  2. Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

    NASA Technical Reports Server (NTRS)

    Barth, Timothy J.

    2014-01-01

    This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.

  3. Entropy maximization under the constraints on the generalized Gini index and its application in modeling income distributions

    NASA Astrophysics Data System (ADS)

    Khosravi Tanak, A.; Mohtashami Borzadaran, G. R.; Ahmadi, J.

    2015-11-01

    In economics and social sciences, the inequality measures such as Gini index, Pietra index etc., are commonly used to measure the statistical dispersion. There is a generalization of Gini index which includes it as special case. In this paper, we use principle of maximum entropy to approximate the model of income distribution with a given mean and generalized Gini index. Many distributions have been used as descriptive models for the distribution of income. The most widely known of these models are the generalized beta of second kind and its subclass distributions. The obtained maximum entropy distributions are fitted to the US family total money income in 2009, 2011 and 2013 and their relative performances with respect to generalized beta of second kind family are compared.

  4. Incorporating Yearly Derived Winter Wheat Maps Into Winter Wheat Yield Forecasting Model

    NASA Technical Reports Server (NTRS)

    Skakun, S.; Franch, B.; Roger, J.-C.; Vermote, E.; Becker-Reshef, I.; Justice, C.; Santamaría-Artigas, A.

    2016-01-01

    Wheat is one of the most important cereal crops in the world. Timely and accurate forecast of wheat yield and production at global scale is vital in implementing food security policy. Becker-Reshef et al. (2010) developed a generalized empirical model for forecasting winter wheat production using remote sensing data and official statistics. This model was implemented using static wheat maps. In this paper, we analyze the impact of incorporating yearly wheat masks into the forecasting model. We propose a new approach of producing in season winter wheat maps exploiting satellite data and official statistics on crop area only. Validation on independent data showed that the proposed approach reached 6% to 23% of omission error and 10% to 16% of commission error when mapping winter wheat 2-3 months before harvest. In general, we found a limited impact of using yearly winter wheat masks over a static mask for the study regions.

  5. Remote sensing-aided systems for snow qualification, evapotranspiration estimation, and their application in hydrologic models

    NASA Technical Reports Server (NTRS)

    Korram, S.

    1977-01-01

    The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.

  6. A Model for the Determination of the Costs of Special Education as Compared with That for General Education. Appendix: Part 1.

    ERIC Educational Resources Information Center

    Ernst and Ernst, Chicago, IL.

    Part 1 of the appendix to "A Model for the Determination of the Costs of Special Education as Compared with That for General Education" contains comprehensive descriptive and statistical information on Ernstville, a hypothetical school district conceived to illustrate the operation of a proposed cost accounting system. Included are sections on…

  7. Downscaling of Global Climate Change Estimates to Regional Scales: An Application to Iberian Rainfall in Wintertime.

    NASA Astrophysics Data System (ADS)

    von Storch, Hans; Zorita, Eduardo; Cubasch, Ulrich

    1993-06-01

    A statistical strategy to deduct regional-scale features from climate general circulation model (GCM) simulations has been designed and tested. The main idea is to interrelate the characteristic patterns of observed simultaneous variations of regional climate parameters and of large-scale atmospheric flow using the canonical correlation technique.The large-scale North Atlantic sea level pressure (SLP) is related to the regional, variable, winter (DJF) mean Iberian Peninsula rainfall. The skill of the resulting statistical model is shown by reproducing, to a good approximation, the winter mean Iberian rainfall from 1900 to present from the observed North Atlantic mean SLP distributions. It is shown that this observed relationship between these two variables is not well reproduced in the output of a general circulation model (GCM).The implications for Iberian rainfall changes as the response to increasing atmospheric greenhouse-gas concentrations simulated by two GCM experiments are examined with the proposed statistical model. In an instantaneous `2 C02' doubling experiment, using the simulated change of the mean North Atlantic SLP field to predict Iberian rainfall yields, there is an insignificant increase of area-averaged rainfall of 1 mm/month, with maximum values of 4 mm/month in the northwest of the peninsula. In contrast, for the four GCM grid points representing the Iberian Peninsula, the change is 10 mm/month, with a minimum of 19 mm/month in the southwest. In the second experiment, with the IPCC scenario A ("business as usual") increase Of C02, the statistical-model results partially differ from the directly simulated rainfall changes: in the experimental range of 100 years, the area-averaged rainfall decreases by 7 mm/month (statistical model), and by 9 mm/month (GCM); at the same time the amplitude of the interdecadal variability is quite different.

  8. Statistical downscaling of general-circulation-model- simulated average monthly air temperature to the beginning of flowering of the dandelion (Taraxacum officinale) in Slovenia

    NASA Astrophysics Data System (ADS)

    Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika

    2002-02-01

    Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blume-Kohout, Robin J; Scholten, Travis L.

    Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) shouldmore » not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.« less

  10. Adaptive Error Estimation in Linearized Ocean General Circulation Models

    NASA Technical Reports Server (NTRS)

    Chechelnitsky, Michael Y.

    1999-01-01

    Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.

  11. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

    PubMed

    Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

    2018-07-01

    In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Audience Diversion Due to Cable Television: A Statistical Analysis of New Data.

    ERIC Educational Resources Information Center

    Park, Rolla Edward

    A statistical analysis of new data suggests that television broadcasting will continue to prosper, despite increasing competition from cable television carrying distant signals. Data on cable and non-cable audiences in 121 counties with well defined signal choice support generalized least squares estimates of two models: total audience and…

  13. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  14. Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation

    NASA Astrophysics Data System (ADS)

    Matsumoto, Takumi; Sagawa, Takahiro

    2018-04-01

    A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.

  15. An evaluation of three statistical estimation methods for assessing health policy effects on prescription drug claims.

    PubMed

    Mittal, Manish; Harrison, Donald L; Thompson, David M; Miller, Michael J; Farmer, Kevin C; Ng, Yu-Tze

    2016-01-01

    While the choice of analytical approach affects study results and their interpretation, there is no consensus to guide the choice of statistical approaches to evaluate public health policy change. This study compared and contrasted three statistical estimation procedures in the assessment of a U.S. Food and Drug Administration (FDA) suicidality warning, communicated in January 2008 and implemented in May 2009, on antiepileptic drug (AED) prescription claims. Longitudinal designs were utilized to evaluate Oklahoma (U.S. State) Medicaid claim data from January 2006 through December 2009. The study included 9289 continuously eligible individuals with prevalent diagnoses of epilepsy and/or psychiatric disorder. Segmented regression models using three estimation procedures [i.e., generalized linear models (GLM), generalized estimation equations (GEE), and generalized linear mixed models (GLMM)] were used to estimate trends of AED prescription claims across three time periods: before (January 2006-January 2008); during (February 2008-May 2009); and after (June 2009-December 2009) the FDA warning. All three statistical procedures estimated an increasing trend (P < 0.0001) in AED prescription claims before the FDA warning period. No procedures detected a significant change in trend during (GLM: -30.0%, 99% CI: -60.0% to 10.0%; GEE: -20.0%, 99% CI: -70.0% to 30.0%; GLMM: -23.5%, 99% CI: -58.8% to 1.2%) and after (GLM: 50.0%, 99% CI: -70.0% to 160.0%; GEE: 80.0%, 99% CI: -20.0% to 200.0%; GLMM: 47.1%, 99% CI: -41.2% to 135.3%) the FDA warning when compared to pre-warning period. Although the three procedures provided consistent inferences, the GEE and GLMM approaches accounted appropriately for correlation. Further, marginal models estimated using GEE produced more robust and valid population-level estimations. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. The Thomas–Fermi quark model: Non-relativistic aspects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Quan, E-mail: quan_liu@baylor.edu; Wilcox, Walter, E-mail: walter_wilcox@baylor.edu

    The first numerical investigation of non-relativistic aspects of the Thomas–Fermi (TF) statistical multi-quark model is given. We begin with a review of the traditional TF model without an explicit spin interaction and find that the spin splittings are too small in this approach. An explicit spin interaction is then introduced which entails the definition of a generalized spin “flavor”. We investigate baryonic states in this approach which can be described with two inequivalent wave functions; such states can however apply to multiple degenerate flavors. We find that the model requires a spatial separation of quark flavors, even if completely degenerate.more » Although the TF model is designed to investigate the possibility of many-quark states, we find surprisingly that it may be used to fit the low energy spectrum of almost all ground state octet and decuplet baryons. The charge radii of such states are determined and compared with lattice calculations and other models. The low energy fit obtained allows us to extrapolate to the six-quark doubly strange H-dibaryon state, flavor symmetric strange states of higher quark content and possible six quark nucleon–nucleon resonances. The emphasis here is on the systematics revealed in this approach. We view our model as a versatile and convenient tool for quickly assessing the characteristics of new, possibly bound, particle states of higher quark number content. -- Highlights: • First application of the statistical Thomas–Fermi quark model to baryonic systems. • Novel aspects: spin as generalized flavor; spatial separation of quark flavor phases. • The model is statistical, but the low energy baryonic spectrum is successfully fit. • Numerical applications include the H-dibaryon, strange states and nucleon resonances. • The statistical point of view does not encourage the idea of bound many-quark baryons.« less

  17. Quantifying the evolution of flow boiling bubbles by statistical testing and image analysis: toward a general model.

    PubMed

    Xiao, Qingtai; Xu, Jianxin; Wang, Hua

    2016-08-16

    A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target.

  18. Quantifying the evolution of flow boiling bubbles by statistical testing and image analysis: toward a general model

    PubMed Central

    Xiao, Qingtai; Xu, Jianxin; Wang, Hua

    2016-01-01

    A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target. PMID:27527065

  19. Modeling Error Distributions of Growth Curve Models through Bayesian Methods

    ERIC Educational Resources Information Center

    Zhang, Zhiyong

    2016-01-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…

  20. Implementation and Research on the Operational Use of the Mesoscale Prediction Model COAMPS in Poland

    DTIC Science & Technology

    2008-09-30

    participated in EGU General Assembly , Vienna Austria 13-18 April 2008, giving a poster presentation. Bogumil Jakubiak, University of Warsaw...participated in EGU General Assembly , Vienna Austria 13-18 April 2008, giving two posters presentation. Mikolaj Sierzega, University of Warwick – participated...model forecast to generate background error statistics. This helps us to identify and understand the uncertainties in high-resolution NWP forecasts

  1. scoringRules - A software package for probabilistic model evaluation

    NASA Astrophysics Data System (ADS)

    Lerch, Sebastian; Jordan, Alexander; Krüger, Fabian

    2016-04-01

    Models in the geosciences are generally surrounded by uncertainty, and being able to quantify this uncertainty is key to good decision making. Accordingly, probabilistic forecasts in the form of predictive distributions have become popular over the last decades. With the proliferation of probabilistic models arises the need for decision theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way. Various scoring rules have been developed over the past decades to address this demand. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. As such, they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This poster presents the software package scoringRules for the statistical programming language R, which contains functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. Two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, Bayesian forecasts produced via Markov Chain Monte Carlo take this form. Thereby, the scoringRules package provides a framework for generalized model evaluation that both includes Bayesian as well as classical parametric models. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices.

  2. Sample sizes and model comparison metrics for species distribution models

    Treesearch

    B.B. Hanberry; H.S. He; D.C. Dey

    2012-01-01

    Species distribution models use small samples to produce continuous distribution maps. The question of how small a sample can be to produce an accurate model generally has been answered based on comparisons to maximum sample sizes of 200 observations or fewer. In addition, model comparisons often are made with the kappa statistic, which has become controversial....

  3. A Management Information System Model for Program Management. Ph.D. Thesis - Oklahoma State Univ.; [Computerized Systems Analysis

    NASA Technical Reports Server (NTRS)

    Shipman, D. L.

    1972-01-01

    The development of a model to simulate the information system of a program management type of organization is reported. The model statistically determines the following parameters: type of messages, destinations, delivery durations, type processing, processing durations, communication channels, outgoing messages, and priorites. The total management information system of the program management organization is considered, including formal and informal information flows and both facilities and equipment. The model is written in General Purpose System Simulation 2 computer programming language for use on the Univac 1108, Executive 8 computer. The model is simulated on a daily basis and collects queue and resource utilization statistics for each decision point. The statistics are then used by management to evaluate proposed resource allocations, to evaluate proposed changes to the system, and to identify potential problem areas. The model employs both empirical and theoretical distributions which are adjusted to simulate the information flow being studied.

  4. Modeling Group Interactions via Open Data Sources

    DTIC Science & Technology

    2011-08-30

    data. The state-of-art search engines are designed to help general query-specific search and not suitable for finding disconnected online groups. The...groups, (2) developing innovative mathematical and statistical models and efficient algorithms that leverage existing search engines and employ

  5. Data Modeling for Preservice Teachers and Everyone Else

    ERIC Educational Resources Information Center

    Petrosino, Anthony J.; Mann, Michele J.

    2018-01-01

    Although data modeling, the employment of statistical reasoning for the purpose of investigating questions about the world, is central to both mathematics and science, it is rarely emphasized in K-16 instruction. The current work focuses on developing thinking about data modeling with undergraduates in general and preservice teachers in…

  6. An Illustration of Diagnostic Classification Modeling in Student Learning Outcomes Assessment

    ERIC Educational Resources Information Center

    Jurich, Daniel P.; Bradshaw, Laine P.

    2014-01-01

    The assessment of higher-education student learning outcomes is an important component in understanding the strengths and weaknesses of academic and general education programs. This study illustrates the application of diagnostic classification models, a burgeoning set of statistical models, in assessing student learning outcomes. To facilitate…

  7. Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes

    ERIC Educational Resources Information Center

    Murakami, Akira

    2016-01-01

    This article introduces two sophisticated statistical modeling techniques that allow researchers to analyze systematicity, individual variation, and nonlinearity in second language (L2) development. Generalized linear mixed-effects models can be used to quantify individual variation and examine systematic effects simultaneously, and generalized…

  8. Using statistical and machine learning to help institutions detect suspicious access to electronic health records.

    PubMed

    Boxwala, Aziz A; Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.

  9. Using statistical and machine learning to help institutions detect suspicious access to electronic health records

    PubMed Central

    Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912

  10. Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

    PubMed Central

    Marcum, Christopher Steven; Butts, Carter T.

    2015-01-01

    The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

  11. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  12. Neuroimaging Research: from Null-Hypothesis Falsification to Out-Of-Sample Generalization

    ERIC Educational Resources Information Center

    Bzdok, Danilo; Varoquaux, Gaël; Thirion, Bertrand

    2017-01-01

    Brain-imaging technology has boosted the quantification of neurobiological phenomena underlying human mental operations and their disturbances. Since its inception, drawing inference on neurophysiological effects hinged on classical statistical methods, especially, the general linear model. The tens of thousands of variables per brain scan were…

  13. Landslides, forest fires, and earthquakes: examples of self-organized critical behavior

    NASA Astrophysics Data System (ADS)

    Turcotte, Donald L.; Malamud, Bruce D.

    2004-09-01

    Per Bak conceived self-organized criticality as an explanation for the behavior of the sandpile model. Subsequently, many cellular automata models were found to exhibit similar behavior. Two examples are the forest-fire and slider-block models. Each of these models can be associated with a serious natural hazard: the sandpile model with landslides, the forest-fire model with actual forest fires, and the slider-block model with earthquakes. We examine the noncumulative frequency-area statistics for each natural hazard, and show that each has a robust power-law (fractal) distribution. We propose an inverse-cascade model as a general explanation for the power-law frequency-area statistics of the three cellular-automata models and their ‘associated’ natural hazards.

  14. Inferring general relations between network characteristics from specific network ensembles.

    PubMed

    Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan

    2012-01-01

    Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

  15. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  16. A model of strength

    USGS Publications Warehouse

    Johnson, Douglas H.; Cook, R.D.

    2013-01-01

    In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.

  17. Effects of preprocessing Landsat MSS data on derived features

    NASA Technical Reports Server (NTRS)

    Parris, T. M.; Cicone, R. C.

    1983-01-01

    Important to the use of multitemporal Landsat MSS data for earth resources monitoring, such as agricultural inventories, is the ability to minimize the effects of varying atmospheric and satellite viewing conditions, while extracting physically meaningful features from the data. In general, the approaches to the preprocessing problem have been derived from either physical or statistical models. This paper compares three proposed algorithms; XSTAR haze correction, Color Normalization, and Multiple Acquisition Mean Level Adjustment. These techniques represent physical, statistical, and hybrid physical-statistical models, respectively. The comparisons are made in the context of three feature extraction techniques; the Tasseled Cap, the Cate Color Cube. and Normalized Difference.

  18. Quantum-Like Bayesian Networks for Modeling Decision Making

    PubMed Central

    Moreira, Catarina; Wichert, Andreas

    2016-01-01

    In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669

  19. A generalized regression model of arsenic variations in the shallow groundwater of Bangladesh

    PubMed Central

    Taylor, Richard G.; Chandler, Richard E.

    2015-01-01

    Abstract Localized studies of arsenic (As) in Bangladesh have reached disparate conclusions regarding the impact of irrigation‐induced recharge on As concentrations in shallow (≤50 m below ground level) groundwater. We construct generalized regression models (GRMs) to describe observed spatial variations in As concentrations in shallow groundwater both (i) nationally, and (ii) regionally within Holocene deposits where As concentrations in groundwater are generally high (>10 μg L−1). At these scales, the GRMs reveal statistically significant inverse associations between observed As concentrations and two covariates: (1) hydraulic conductivity of the shallow aquifer and (2) net increase in mean recharge between predeveloped and developed groundwater‐fed irrigation periods. Further, the GRMs show that the spatial variation of groundwater As concentrations is well explained by not only surface geology but also statistical interactions (i.e., combined effects) between surface geology and mean groundwater recharge, thickness of surficial silt and clay, and well depth. Net increases in recharge result from intensive groundwater abstraction for irrigation, which induces additional recharge where it is enabled by a permeable surface geology. Collectively, these statistical associations indicate that irrigation‐induced recharge serves to flush mobile As from shallow groundwater. PMID:27524841

  20. Landau's statistical mechanics for quasi-particle models

    NASA Astrophysics Data System (ADS)

    Bannur, Vishnu M.

    2014-04-01

    Landau's formalism of statistical mechanics [following L. D. Landau and E. M. Lifshitz, Statistical Physics (Pergamon Press, Oxford, 1980)] is applied to the quasi-particle model of quark-gluon plasma. Here, one starts from the expression for pressure and develop all thermodynamics. It is a general formalism and consistent with our earlier studies [V. M. Bannur, Phys. Lett. B647, 271 (2007)] based on Pathria's formalism [following R. K. Pathria, Statistical Mechanics (Butterworth-Heinemann, Oxford, 1977)]. In Pathria's formalism, one starts from the expression for energy density and develop thermodynamics. Both the formalisms are consistent with thermodynamics and statistical mechanics. Under certain conditions, which are wrongly called thermodynamic consistent relation, we recover other formalism of quasi-particle system, like in M. I. Gorenstein and S. N. Yang, Phys. Rev. D52, 5206 (1995), widely studied in quark-gluon plasma.

  1. Statistics on Blindness in the Model Reporting Area 1969-1970.

    ERIC Educational Resources Information Center

    Kahn, Harold A.; Moorhead, Helen B.

    Presented in the form of 30 tables are statistics on blindness in 16 states which have agreed to uniform definitions and procedures to improve reliability of data regarding blind persons. The data indicates that rates of blindness were generally higher for nonwhites than for whites with the ratio ranging from almost 10 for glaucoma to minimal for…

  2. Statistics of stable marriages

    NASA Astrophysics Data System (ADS)

    Dzierzawa, Michael; Oméro, Marie-José

    2000-11-01

    In the stable marriage problem N men and N women have to be matched by pairs under the constraint that the resulting matching is stable. We study the statistical properties of stable matchings in the large N limit using both numerical and analytical methods. Generalizations of the model including singles and unequal numbers of men and women are also investigated.

  3. The distribution of density in supersonic turbulence

    NASA Astrophysics Data System (ADS)

    Squire, Jonathan; Hopkins, Philip F.

    2017-11-01

    We propose a model for the statistics of the mass density in supersonic turbulence, which plays a crucial role in star formation and the physics of the interstellar medium (ISM). The model is derived by considering the density to be arranged as a collection of strong shocks of width ˜ M^{-2}, where M is the turbulent Mach number. With two physically motivated parameters, the model predicts all density statistics for M>1 turbulence: the density probability distribution and its intermittency (deviation from lognormality), the density variance-Mach number relation, power spectra and structure functions. For the proposed model parameters, reasonable agreement is seen between model predictions and numerical simulations, albeit within the large uncertainties associated with current simulation results. More generally, the model could provide a useful framework for more detailed analysis of future simulations and observational data. Due to the simple physical motivations for the model in terms of shocks, it is straightforward to generalize to more complex physical processes, which will be helpful in future more detailed applications to the ISM. We see good qualitative agreement between such extensions and recent simulations of non-isothermal turbulence.

  4. Statistical Teleodynamics: Toward a Theory of Emergence.

    PubMed

    Venkatasubramanian, Venkat

    2017-10-24

    The central scientific challenge of the 21st century is developing a mathematical theory of emergence that can explain and predict phenomena such as consciousness and self-awareness. The most successful research program of the 20th century, reductionism, which goes from the whole to parts, seems unable to address this challenge. This is because addressing this challenge inherently requires an opposite approach, going from parts to the whole. In addition, reductionism, by the very nature of its inquiry, typically does not concern itself with teleology or purposeful behavior. Modeling emergence, in contrast, requires the addressing of teleology. Together, these two requirements present a formidable challenge in developing a successful mathematical theory of emergence. In this article, I describe a new theory of emergence, called statistical teleodynamics, that addresses certain aspects of the general problem. Statistical teleodynamics is a mathematical framework that unifies three seemingly disparate domains-purpose-free entities in statistical mechanics, human engineered teleological systems in systems engineering, and nature-evolved teleological systems in biology and sociology-within the same conceptual formalism. This theory rests on several key conceptual insights, the most important one being the recognition that entropy mathematically models the concept of fairness in economics and philosophy and, equivalently, the concept of robustness in systems engineering. These insights help prove that the fairest inequality of income is a log-normal distribution, which will emerge naturally at equilibrium in an ideal free market society. Similarly, the theory predicts the emergence of the three classes of network organization-exponential, scale-free, and Poisson-seen widely in a variety of domains. Statistical teleodynamics is the natural generalization of statistical thermodynamics, the most successful parts-to-whole systems theory to date, but this generalization is only a modest step toward a more comprehensive mathematical theory of emergence.

  5. Experiments in monthly mean simulation of the atmosphere with a coarse-mesh general circulation model

    NASA Technical Reports Server (NTRS)

    Lutz, R. J.; Spar, J.

    1978-01-01

    The Hansen atmospheric model was used to compute five monthly forecasts (October 1976 through February 1977). The comparison is based on an energetics analysis, meridional and vertical profiles, error statistics, and prognostic and observed mean maps. The monthly mean model simulations suffer from several defects. There is, in general, no skill in the simulation of the monthly mean sea-level pressure field, and only marginal skill is indicated for the 850 mb temperatures and 500 mb heights. The coarse-mesh model appears to generate a less satisfactory monthly mean simulation than the finer mesh GISS model.

  6. Model fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regression.

    PubMed

    Ashrafi, Parivash; Sun, Yi; Davey, Neil; Adams, Roderick G; Wilkinson, Simon C; Moss, Gary Patrick

    2018-03-01

    The aim of this study was to investigate how to improve predictions from Gaussian Process models by optimising the model hyperparameters. Optimisation methods, including Grid Search, Conjugate Gradient, Random Search, Evolutionary Algorithm and Hyper-prior, were evaluated and applied to previously published data. Data sets were also altered in a structured manner to reduce their size, which retained the range, or 'chemical space' of the key descriptors to assess the effect of the data range on model quality. The Hyper-prior Smoothbox kernel results in the best models for the majority of data sets, and they exhibited significantly better performance than benchmark quantitative structure-permeability relationship (QSPR) models. When the data sets were systematically reduced in size, the different optimisation methods generally retained their statistical quality, whereas benchmark QSPR models performed poorly. The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets. © 2018 Royal Pharmaceutical Society.

  7. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

    PubMed

    Schaid, Daniel J

    2010-01-01

    Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.

  8. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  9. Texture and haptic cues in slant discrimination: reliability-based cue weighting without statistically optimal cue combination

    NASA Astrophysics Data System (ADS)

    Rosas, Pedro; Wagemans, Johan; Ernst, Marc O.; Wichmann, Felix A.

    2005-05-01

    A number of models of depth-cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum-variance unbiased estimator that can be constructed from the available information. Here we test such models by using visual and haptic depth information. Different texture types produce differences in slant-discrimination performance, thus providing a means for testing a reliability-sensitive cue-combination model with texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability but fell short of statistically optimal combination - we find reliability-based reweighting but not statistically optimal cue combination.

  10. Quantifying variation in speciation and extinction rates with clade data.

    PubMed

    Paradis, Emmanuel; Tedesco, Pablo A; Hugueny, Bernard

    2013-12-01

    High-level phylogenies are very common in evolutionary analyses, although they are often treated as incomplete data. Here, we provide statistical tools to analyze what we name "clade data," which are the ages of clades together with their numbers of species. We develop a general approach for the statistical modeling of variation in speciation and extinction rates, including temporal variation, unknown variation, and linear and nonlinear modeling. We show how this approach can be generalized to a wide range of situations, including testing the effects of life-history traits and environmental variables on diversification rates. We report the results of an extensive simulation study to assess the performance of some statistical tests presented here as well as of the estimators of speciation and extinction rates. These latter results suggest the possibility to estimate correctly extinction rate in the absence of fossils. An example with data on fish is presented. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.

  11. A person based formula for allocating commissioning funds to general practices in England: development of a statistical model.

    PubMed

    Dixon, Jennifer; Smith, Peter; Gravelle, Hugh; Martin, Steve; Bardsley, Martin; Rice, Nigel; Georghiou, Theo; Dusheiko, Mark; Billings, John; Lorenzo, Michael De; Sanderson, Colin

    2011-11-22

    To develop a formula for allocating resources for commissioning hospital care to all general practices in England based on the health needs of the people registered in each practice Multivariate prospective statistical models were developed in which routinely collected electronic information from 2005-6 and 2006-7 on individuals and the areas in which they lived was used to predict their costs of hospital care in the next year, 2007-8. Data on individuals included all diagnoses recorded at any inpatient admission. Models were developed on a random sample of 5 million people and validated on a second random sample of 5 million people and a third sample of 5 million people drawn from a random sample of practices. All general practices in England as of 1 April 2007. All NHS inpatient admissions and outpatient attendances for individuals registered with a general practice on that date. All individuals registered with a general practice in England at 1 April 2007. Power of the statistical models to predict the costs of the individual patient or each practice's registered population for 2007-8 tested with a range of metrics (R(2) reported here). Comparisons of predicted costs in 2007-8 with actual costs incurred in the same year were calculated by individual and by practice. Models including person level information (age, sex, and ICD-10 codes diagnostic recorded) and a range of area level information (such as socioeconomic deprivation and supply of health facilities) were most predictive of costs. After accounting for person level variables, area level variables added little explanatory power. The best models for resource allocation could predict upwards of 77% of the variation in costs at practice level, and about 12% at the person level. With these models, the predicted costs of about a third of practices would exceed or undershoot the actual costs by 10% or more. Smaller practices were more likely to be in these groups. A model was developed that performed well by international standards, and could be used for allocations to practices for commissioning. The best formulas, however, could predict only about 12% of the variation in next year's costs of most inpatient and outpatient NHS care for each individual. Person-based diagnostic data significantly added to the predictive power of the models.

  12. Interactions and triggering in a 3D rate and state asperity model

    NASA Astrophysics Data System (ADS)

    Dublanchet, P.; Bernard, P.

    2012-12-01

    Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.

  13. Automation of Ocean Product Metrics

    DTIC Science & Technology

    2008-09-30

    Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data

  14. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    ERIC Educational Resources Information Center

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  15. Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis

    PubMed Central

    McDermott, Josh H.; Simoncelli, Eero P.

    2014-01-01

    Rainstorms, insect swarms, and galloping horses produce “sound textures” – the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures. However, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation. PMID:21903084

  16. Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

    NASA Technical Reports Server (NTRS)

    Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard

    2013-01-01

    Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.

  17. Assessing risk factors for dental caries: a statistical modeling approach.

    PubMed

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  18. Statistical methods for investigating quiescence and other temporal seismicity patterns

    USGS Publications Warehouse

    Matthews, M.V.; Reasenberg, P.A.

    1988-01-01

    We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.

  19. Data-based Non-Markovian Model Inference

    NASA Astrophysics Data System (ADS)

    Ghil, Michael

    2015-04-01

    This talk concentrates on obtaining stable and efficient data-based models for simulation and prediction in the geosciences and life sciences. The proposed model derivation relies on using a multivariate time series of partial observations from a large-dimensional system, and the resulting low-order models are compared with the optimal closures predicted by the non-Markovian Mori-Zwanzig formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a very broad generalization and a time-continuous limit of existing multilevel, regression-based approaches to data-based closure, in particular of empirical model reduction (EMR). We show that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the Mori-Zwanzig formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are given for the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a very broad class of MSM applications. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. The resulting reduced model with energy-conserving nonlinearities captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lokta-Volterra model of population dynamics in its chaotic regime. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up. This work is based on a close collaboration with M.D. Chekroun, D. Kondrashov, S. Kravtsov and A.W. Robertson.

  20. Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

    PubMed

    Monica, Stefania; Ferrari, Gianluigi

    2018-05-17

    Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.

  1. Statistics of acoustic emissions and stress drops during granular shearing using a stick-slip fiber bundle mode

    NASA Astrophysics Data System (ADS)

    Cohen, D.; Michlmayr, G.; Or, D.

    2012-04-01

    Shearing of dense granular materials appears in many engineering and Earth sciences applications. Under a constant strain rate, the shearing stress at steady state oscillates with slow rises followed by rapid drops that are linked to the build up and failure of force chains. Experiments indicate that these drops display exponential statistics. Measurements of acoustic emissions during shearing indicates that the energy liberated by failure of these force chains has power-law statistics. Representing force chains as fibers, we use a stick-slip fiber bundle model to obtain analytical solutions of the statistical distribution of stress drops and failure energy. In the model, fibers stretch, fail, and regain strength during deformation. Fibers have Weibull-distributed threshold strengths with either quenched and annealed disorder. The shape of the distribution for drops and energy obtained from the model are similar to those measured during shearing experiments. This simple model may be useful to identify failure events linked to force chain failures. Future generalizations of the model that include different types of fiber failure may also allow identification of different types of granular failures that have distinct statistical acoustic emission signatures.

  2. A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis

    PubMed Central

    Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.

    2014-01-01

    Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489

  3. A General Class of Test Statistics for Van Valen's Red Queen Hypothesis.

    PubMed

    Wiltshire, Jelani; Huffer, Fred W; Parker, William C

    2014-09-01

    Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material.

  4. Prescriptive Statements and Educational Practice: What Can Structural Equation Modeling (SEM) Offer?

    ERIC Educational Resources Information Center

    Martin, Andrew J.

    2011-01-01

    Longitudinal structural equation modeling (SEM) can be a basis for making prescriptive statements on educational practice and offers yields over "traditional" statistical techniques under the general linear model. The extent to which prescriptive statements can be made will rely on the appropriate accommodation of key elements of research design,…

  5. Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

    ERIC Educational Resources Information Center

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2010-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…

  6. Taxometric Analysis as a General Strategy for Distinguishing Categorical from Dimensional Latent Structure

    ERIC Educational Resources Information Center

    McGrath, Robert E.; Walters, Glenn D.

    2012-01-01

    Statistical analyses investigating latent structure can be divided into those that estimate structural model parameters and those that detect the structural model type. The most basic distinction among structure types is between categorical (discrete) and dimensional (continuous) models. It is a common, and potentially misleading, practice to…

  7. EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 1. OVERVIEW

    EPA Science Inventory

    The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four report volumes. Moreover, the tests are generally applicable to other model evaluation problem...

  8. Behavior of the maximum likelihood in quantum state tomography

    NASA Astrophysics Data System (ADS)

    Scholten, Travis L.; Blume-Kohout, Robin

    2018-02-01

    Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) should not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.

  9. Behavior of the maximum likelihood in quantum state tomography

    DOE PAGES

    Blume-Kohout, Robin J; Scholten, Travis L.

    2018-02-22

    Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) shouldmore » not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.« less

  10. Behavior of the maximum likelihood in quantum state tomography

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blume-Kohout, Robin J; Scholten, Travis L.

    Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) shouldmore » not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.« less

  11. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Extinction time of a stochastic predator-prey model by the generalized cell mapping method

    NASA Astrophysics Data System (ADS)

    Han, Qun; Xu, Wei; Hu, Bing; Huang, Dongmei; Sun, Jian-Qiao

    2018-03-01

    The stochastic response and extinction time of a predator-prey model with Gaussian white noise excitations are studied by the generalized cell mapping (GCM) method based on the short-time Gaussian approximation (STGA). The methods for stochastic response probability density functions (PDFs) and extinction time statistics are developed. The Taylor expansion is used to deal with non-polynomial nonlinear terms of the model for deriving the moment equations with Gaussian closure, which are needed for the STGA in order to compute the one-step transition probabilities. The work is validated with direct Monte Carlo simulations. We have presented the transient responses showing the evolution from a Gaussian initial distribution to a non-Gaussian steady-state one. The effects of the model parameter and noise intensities on the steady-state PDFs are discussed. It is also found that the effects of noise intensities on the extinction time statistics are opposite to the effects on the limit probability distributions of the survival species.

  13. Testing alternative ground water models using cross-validation and other methods

    USGS Publications Warehouse

    Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

    2007-01-01

    Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.

  14. Analysis and meta-analysis of single-case designs: an introduction.

    PubMed

    Shadish, William R

    2014-04-01

    The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  15. Clinical study of the Erlanger silver catheter--data management and biometry.

    PubMed

    Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P

    1999-01-01

    The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.

  16. High-temperature behavior of a deformed Fermi gas obeying interpolating statistics.

    PubMed

    Algin, Abdullah; Senay, Mustafa

    2012-04-01

    An outstanding idea originally introduced by Greenberg is to investigate whether there is equivalence between intermediate statistics, which may be different from anyonic statistics, and q-deformed particle algebra. Also, a model to be studied for addressing such an idea could possibly provide us some new consequences about the interactions of particles as well as their internal structures. Motivated mainly by this idea, in this work, we consider a q-deformed Fermi gas model whose statistical properties enable us to effectively study interpolating statistics. Starting with a generalized Fermi-Dirac distribution function, we derive several thermostatistical functions of a gas of these deformed fermions in the thermodynamical limit. We study the high-temperature behavior of the system by analyzing the effects of q deformation on the most important thermostatistical characteristics of the system such as the entropy, specific heat, and equation of state. It is shown that such a deformed fermion model in two and three spatial dimensions exhibits the interpolating statistics in a specific interval of the model deformation parameter 0 < q < 1. In particular, for two and three spatial dimensions, it is found from the behavior of the third virial coefficient of the model that the deformation parameter q interpolates completely between attractive and repulsive systems, including the free boson and fermion cases. From the results obtained in this work, we conclude that such a model could provide much physical insight into some interacting theories of fermions, and could be useful to further study the particle systems with intermediate statistics.

  17. An Optimization Principle for Deriving Nonequilibrium Statistical Models of Hamiltonian Dynamics

    NASA Astrophysics Data System (ADS)

    Turkington, Bruce

    2013-08-01

    A general method for deriving closed reduced models of Hamiltonian dynamical systems is developed using techniques from optimization and statistical estimation. Given a vector of resolved variables, selected to describe the macroscopic state of the system, a family of quasi-equilibrium probability densities on phase space corresponding to the resolved variables is employed as a statistical model, and the evolution of the mean resolved vector is estimated by optimizing over paths of these densities. Specifically, a cost function is constructed to quantify the lack-of-fit to the microscopic dynamics of any feasible path of densities from the statistical model; it is an ensemble-averaged, weighted, squared-norm of the residual that results from submitting the path of densities to the Liouville equation. The path that minimizes the time integral of the cost function determines the best-fit evolution of the mean resolved vector. The closed reduced equations satisfied by the optimal path are derived by Hamilton-Jacobi theory. When expressed in terms of the macroscopic variables, these equations have the generic structure of governing equations for nonequilibrium thermodynamics. In particular, the value function for the optimization principle coincides with the dissipation potential that defines the relation between thermodynamic forces and fluxes. The adjustable closure parameters in the best-fit reduced equations depend explicitly on the arbitrary weights that enter into the lack-of-fit cost function. Two particular model reductions are outlined to illustrate the general method. In each example the set of weights in the optimization principle contracts into a single effective closure parameter.

  18. A hybrid model for predicting carbon monoxide from vehicular exhausts in urban environments

    NASA Astrophysics Data System (ADS)

    Gokhale, Sharad; Khare, Mukesh

    Several deterministic-based air quality models evaluate and predict the frequently occurring pollutant concentration well but, in general, are incapable of predicting the 'extreme' concentrations. In contrast, the statistical distribution models overcome the above limitation of the deterministic models and predict the 'extreme' concentrations. However, the environmental damages are caused by both extremes as well as by the sustained average concentration of pollutants. Hence, the model should predict not only 'extreme' ranges but also the 'middle' ranges of pollutant concentrations, i.e. the entire range. Hybrid modelling is one of the techniques that estimates/predicts the 'entire range' of the distribution of pollutant concentrations by combining the deterministic based models with suitable statistical distribution models ( Jakeman, et al., 1988). In the present paper, a hybrid model has been developed to predict the carbon monoxide (CO) concentration distributions at one of the traffic intersections, Income Tax Office (ITO), in the Delhi city, where the traffic is heterogeneous in nature and meteorology is 'tropical'. The model combines the general finite line source model (GFLSM) as its deterministic, and log logistic distribution (LLD) model, as its statistical components. The hybrid (GFLSM-LLD) model is then applied at the ITO intersection. The results show that the hybrid model predictions match with that of the observed CO concentration data within the 5-99 percentiles range. The model is further validated at different street location, i.e. Sirifort roadway. The validation results show that the model predicts CO concentrations fairly well ( d=0.91) in 10-95 percentiles range. The regulatory compliance is also developed to estimate the probability of exceedance of hourly CO concentration beyond the National Ambient Air Quality Standards (NAAQS) of India. It consists of light vehicles, heavy vehicles, three- wheelers (auto rickshaws) and two-wheelers (scooters, motorcycles, etc).

  19. A general model for estimating lower extremity inertial properties of individuals with transtibial amputation.

    PubMed

    Ferris, Abbie E; Smith, Jeremy D; Heise, Gary D; Hinrichs, Richard N; Martin, Philip E

    2017-03-21

    Lower extremity joint moment magnitudes during swing are dependent on the inertial properties of the prosthesis and residual limb of individuals with transtibial amputation (TTA). Often, intact limb inertial properties (INTACT) are used for prosthetic limb values in an inverse dynamics model even though these values overestimate the amputated limb's inertial properties. The purpose of this study was to use subject-specific (SPECIFIC) measures of prosthesis inertial properties to generate a general model (GENERAL) for estimating TTA prosthesis inertial properties. Subject-specific mass, center of mass, and moment of inertia were determined for the shank and foot segments of the prosthesis (n=11) using an oscillation technique and reaction board. The GENERAL model was derived from the means of the SPECIFIC model. Mass and segment lengths are required GENERAL model inputs. Comparisons of segment inertial properties and joint moments during walking were made using three inertial models (unique sample; n=9): (1) SPECIFIC, (2) GENERAL, and (3) INTACT. Prosthetic shank inertial properties were significantly smaller with the SPECIFIC and GENERAL model than the INTACT model, but the SPECIFIC and GENERAL model did not statistically differ. Peak knee and hip joint moments during swing were significantly smaller for the SPECIFIC and GENERAL model compared with the INTACT model and were not significantly different between SPECIFIC and GENERAL models. When subject-specific measures are unavailable, using the GENERAL model produces a better estimate of prosthetic side inertial properties resulting in more accurate joint moment measurements for individuals with TTA than the INTACT model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

    NASA Technical Reports Server (NTRS)

    Abbey, Craig K.; Eckstein, Miguel P.

    2002-01-01

    We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

  1. Statistical model to perform error analysis of curve fits of wind tunnel test data using the techniques of analysis of variance and regression analysis

    NASA Technical Reports Server (NTRS)

    Alston, D. W.

    1981-01-01

    The considered research had the objective to design a statistical model that could perform an error analysis of curve fits of wind tunnel test data using analysis of variance and regression analysis techniques. Four related subproblems were defined, and by solving each of these a solution to the general research problem was obtained. The capabilities of the evolved true statistical model are considered. The least squares fit is used to determine the nature of the force, moment, and pressure data. The order of the curve fit is increased in order to delete the quadratic effect in the residuals. The analysis of variance is used to determine the magnitude and effect of the error factor associated with the experimental data.

  2. A General Retention Model Applied to the Naval Aviator.

    DTIC Science & Technology

    1980-06-01

    California 00 ~DTIC ELE-CTE THESIS A GENERAL RETENTION MODEL APPLIED TO THE NAVAL AVIATOR by James Robert O’Donnell June 1980 ( Thesis Advisor: D . M ... Psychology , v. 62, p. 237-240, 1977. " NAVPERS 15658(A), FY-79 Annual Report , Navy Military Personnel Statistics Office, Washington, D.C., 30 September 1979... Psychology , v. 29, p. 57-60, 1976. 40 INITIAL DISTRIBUTION LIST No. Copies 1. Defense Technical Information Center 2 Cameron Station Alexandria

  3. Types and Characteristics of Data for Geomagnetic Field Modeling

    NASA Technical Reports Server (NTRS)

    Langel, R. A. (Editor); Baldwin, R. T. (Editor)

    1992-01-01

    Given here is material submitted at a symposium convened on Friday, August 23, 1991, at the General Assembly of the International Union of Geodesy and Geophysics (IUGG) held in Vienna, Austria. Models of the geomagnetic field are only as good as the data upon which they are based, and depend upon correct understanding of data characteristics such as accuracy, correlations, systematic errors, and general statistical properties. This symposium was intended to expose and illuminate these data characteristics.

  4. Comparison of Artificial Neural Networks and ARIMA statistical models in simulations of target wind time series

    NASA Astrophysics Data System (ADS)

    Kolokythas, Kostantinos; Vasileios, Salamalikis; Athanassios, Argiriou; Kazantzidis, Andreas

    2015-04-01

    The wind is a result of complex interactions of numerous mechanisms taking place in small or large scales, so, the better knowledge of its behavior is essential in a variety of applications, especially in the field of power production coming from wind turbines. In the literature there is a considerable number of models, either physical or statistical ones, dealing with the problem of simulation and prediction of wind speed. Among others, Artificial Neural Networks (ANNs) are widely used for the purpose of wind forecasting and, in the great majority of cases, outperform other conventional statistical models. In this study, a number of ANNs with different architectures, which have been created and applied in a dataset of wind time series, are compared to Auto Regressive Integrated Moving Average (ARIMA) statistical models. The data consist of mean hourly wind speeds coming from a wind farm on a hilly Greek region and cover a period of one year (2013). The main goal is to evaluate the models ability to simulate successfully the wind speed at a significant point (target). Goodness-of-fit statistics are performed for the comparison of the different methods. In general, the ANN showed the best performance in the estimation of wind speed prevailing over the ARIMA models.

  5. Towards Direct Simulation of Future Tropical Cyclone Statistics in a High-Resolution Global Atmospheric Model

    DOE PAGES

    Wehner, Michael F.; Bala, G.; Duffy, Phillip; ...

    2010-01-01

    We present a set of high-resolution global atmospheric general circulation model (AGCM) simulations focusing on the model's ability to represent tropical storms and their statistics. We find that the model produces storms of hurricane strength with realistic dynamical features. We also find that tropical storm statistics are reasonable, both globally and in the north Atlantic, when compared to recent observations. The sensitivity of simulated tropical storm statistics to increases in sea surface temperature (SST) is also investigated, revealing that a credible late 21st century SST increase produced increases in simulated tropical storm numbers and intensities in all ocean basins. Whilemore » this paper supports previous high-resolution model and theoretical findings that the frequency of very intense storms will increase in a warmer climate, it differs notably from previous medium and high-resolution model studies that show a global reduction in total tropical storm frequency. However, we are quick to point out that this particular model finding remains speculative due to a lack of radiative forcing changes in our time-slice experiments as well as a focus on the Northern hemisphere tropical storm seasons.« less

  6. Downscaling of global climate change estimates to regional scales: An application to Iberian rainfall in wintertime

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    von Storch, H.; Zorita, E.; Cubasch, U.

    A statistical strategy to deduct regional-scale features from climate general circulation model (GCM) simulations has been designed and tested. The main idea is to interrelate the characteristic patterns of observed simultaneous variations of regional climate parameters and of large-scale atmospheric flow using the canonical correlation technique. The large-scale North Atlantic sea level pressure (SLP) is related to the regional, variable, winter (DJF) mean Iberian Peninsula rainfall. The skill of the resulting statistical model is shown by reproducing, to a good approximation, the winter mean Iberian rainfall from 1900 to present from the observed North Atlantic mean SLP distributions. It ismore » shown that this observed relationship between these two variables is not well reproduced in the output of a general circulation model (GCM). The implications for Iberian rainfall changes as the response to increasing atmospheric greenhouse-gas concentrations simulated by two GCM experiments are examined with the proposed statistical model. In an instantaneous [open quotes]2 CO[sub 2][close quotes] doubling experiment, using the simulated change of the mean North Atlantic SLP field to predict Iberian rainfall yields, there is an insignificant increase of area-averaged rainfall of I mm/month, with maximum values of 4 mm/month in the northwest of the peninsula. In contrast, for the four GCM grid points representing the lberian Peninsula, the change is - 10 mm/month, with a minimum of - 19 mm/month in the southwest. In the second experiment, with the IPCC scenario A ([open quotes]business as usual[close quotes]) increase of CO[sub 2], the statistical-model results partially differ from the directly simulated rainfall changes: in the experimental range of 100 years, the area-averaged rainfall decreases by 7 mm/month (statistical model), and by 9 mm/month (GCM); at the same time the amplitude of the interdecadal variability is quite different. 17 refs., 10 figs.« less

  7. Effective field theory of statistical anisotropies for primordial bispectrum and gravitational waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rostami, Tahereh; Karami, Asieh; Firouzjahi, Hassan, E-mail: t.rostami@ipm.ir, E-mail: karami@ipm.ir, E-mail: firouz@ipm.ir

    2017-06-01

    We present the effective field theory studies of primordial statistical anisotropies in models of anisotropic inflation. The general action in unitary gauge is presented to calculate the leading interactions between the gauge field fluctuations, the curvature perturbations and the tensor perturbations. The anisotropies in scalar power spectrum and bispectrum are calculated and the dependence of these anisotropies to EFT couplings are presented. In addition, we calculate the statistical anisotropy in tensor power spectrum and the scalar-tensor cross correlation. Our EFT approach incorporates anisotropies generated in models with non-trivial speed for the gauge field fluctuations and sound speed for scalar perturbationsmore » such as in DBI inflation.« less

  8. Bose--Einstein Correlations and Thermal Cluster Formation in High-energy Collisions

    NASA Astrophysics Data System (ADS)

    Bialas, A.; Florkowski, W.; Zalewski, K.

    The blast wave model is generalized to include the production of thermal clusters, as suggested by the apparent success of the statistical model of particle production at high energies. The formulae for the HBT correlation functions and the corresponding HBT radii are derived.

  9. A call to improve methods for estimating tree biomass for regional and national assessments

    Treesearch

    Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston

    2015-01-01

    Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...

  10. Modeling the Development of Audiovisual Cue Integration in Speech Perception

    PubMed Central

    Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

    2017-01-01

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558

  11. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    PubMed

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  12. Simulation and analysis of scalable non-Gaussian statistically anisotropic random functions

    NASA Astrophysics Data System (ADS)

    Riva, Monica; Panzeri, Marco; Guadagnini, Alberto; Neuman, Shlomo P.

    2015-12-01

    Many earth and environmental (as well as other) variables, Y, and their spatial or temporal increments, ΔY, exhibit non-Gaussian statistical scaling. Previously we were able to capture some key aspects of such scaling by treating Y or ΔY as standard sub-Gaussian random functions. We were however unable to reconcile two seemingly contradictory observations, namely that whereas sample frequency distributions of Y (or its logarithm) exhibit relatively mild non-Gaussian peaks and tails, those of ΔY display peaks that grow sharper and tails that become heavier with decreasing separation distance or lag. Recently we overcame this difficulty by developing a new generalized sub-Gaussian model which captures both behaviors in a unified and consistent manner, exploring it on synthetically generated random functions in one dimension (Riva et al., 2015). Here we extend our generalized sub-Gaussian model to multiple dimensions, present an algorithm to generate corresponding random realizations of statistically isotropic or anisotropic sub-Gaussian functions and illustrate it in two dimensions. We demonstrate the accuracy of our algorithm by comparing ensemble statistics of Y and ΔY (such as, mean, variance, variogram and probability density function) with those of Monte Carlo generated realizations. We end by exploring the feasibility of estimating all relevant parameters of our model by analyzing jointly spatial moments of Y and ΔY obtained from a single realization of Y.

  13. Modeling exposure–lag–response associations with distributed lag non-linear models

    PubMed Central

    Gasparrini, Antonio

    2014-01-01

    In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094

  14. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  15. A gravitational potential finding for rotating cosmological body in the context of proto-planetary dynamics problem solving

    NASA Astrophysics Data System (ADS)

    Krot, Alexander M.

    2008-09-01

    The statistical theory for a cosmological body forming (so-called the spheroidal body model) has been proposed in [1]-[9]. Within the framework of this theory, bodies have fuzzy outlines and are represented by means of spheroidal forms [1],[2]. In the work [3], it has been investigated a slowly evolving in time process of a gravitational compression of a spheroidal body close to an unstable equilibrium state. In the papers [4],[5], the equation of motion of particles inside the weakly gravitating spheroidal body modeled by means of an ideal liquid has been obtained. Using Schwarzschild's and Kerr's metrics a consistency of the proposed statistical model with the general relativity has been shown in [6]. The proposed theory follows from the conception for forming a spheroidal body from protoplanetary nebula [7],[8]; it permits to derive the form of distribution functions for an immovable [1]-[5] and rotating spheroidal body [6]-[8] as well as their density masses and also the distribution function of specific angular momentum of the rotating uniformly spheroidal body [7],[8]. It is well-known there is not a statistical equilibrium in a gas-dust proto-planetary cloud because of long relaxation time for proto-planets formation in own gravitational field. This proto-planetary system behavior can be described by Jeans' equation in partial derivations relative to a distribution function [9]. The problem for finding a general solution of Jeans' equation is connected directly with an analytical expression for potential of gravitational field. Thus, the determination of gravitational potential is the main problem of statistical dynamics for proto-planetary system [9]. This work shows this task of proto-planetary dynamics can be solved on the basis of spheroidal bodies theory. The proposed theory permits to derive the form of gravitational potential for a rotating spheroidal body at a long distance from its center. Using the obtained analytical expression for potential of gravitational field, the gravitational strength (as well as angular momentum space function) in a remote zone of a slowly gravitational compressed rotating spheroidal body is obtained. As a result, a distribution function describing mechanical state of proto-planetary system can be found from the Jeans' equation. References: [1] Krot AM. The statistical model of gravitational interaction of particles. Uspekhi Sovremennoï Radioelektroniki (special issue "Cosmic Radiophysics", Moscow) 1996; 8: 66-81 (in Russian). [2] Krot AM. Use of the statistical model of gravity for analysis of nonhomogeneity in earth surface. Proc. SPIE's 13th Annual Intern. Symposium "AeroSense", Orlando, Florida, USA, April 5-9, 1999; 3710: 1248-1259. [3] Krot AM. Statistical description of gravitational field: a new approach. Proc. SPIE's 14th Annual Intern.Symposium "AeroSense", Orlando, Florida, USA, April 24-28, 2000; 4038: 1318-1329. [4] Krot AM. Gravidynamical equations for a weakly gravitating spheroidal body. Proc. SPIE's 15th Annual Intern. Symposium "AeroSense", Orlando, Florida, USA, April 16-20, 2001; 4394: 1271-1282. [5] Krot AM. Development of gravidynamical equations for a weakly gravitating body in the vicinity of absolute zero temperature. Proc. 53rd Intern. Astronautical Congress (IAC) - The 2nd World Space Congress-2002, Houston, Texas, USA, October 10-19, 2002; Preprint IAC-02-J.P.01: 1-11. [6] Krot AM. The statistical model of rotating and gravitating spheroidal body with the point of view of general relativity. Proc. 35th COSPAR Scientific Assembly, Paris, France, July 18-25, 2004; Abstract-Nr. COSPAR 04-A- 00162. [7] Krot A. The statistical approach to exploring formation of Solar system. Proc. European Geoscinces Union (EGU) General Assembly, Vienna, Austria, April 02-07, 2006; Geophysical Research Abstracts, vol. 8: EGU06-A- 00216, SRef-ID: 1607-7962/gra/. [8] Krot AM. The statistical model of original and evolution planets of Solar system and planetary satellities. Proc. European Planetary Science Congress, Berlin, Germany, September 18-22, 2006; Planetary Research Abstracts, ESPC2006-A-00014. [9] Krot A. On the principal difficulties and ways to their solution in the theory of gravitational condensation of infinitely distributed dust substance. Proc. XXIV IUGG General Assembly, Perugia, Italy, July 2-13, 2007; GS002 Symposium "Gravity Field", Abstract GS002-3598: 143-144.

  16. Cure Models as a Useful Statistical Tool for Analyzing Survival

    PubMed Central

    Othus, Megan; Barlogie, Bart; LeBlanc, Michael L.; Crowley, John J.

    2013-01-01

    Cure models are a popular topic within statistical literature but are not as widely known in the clinical literature. Many patients with cancer can be long-term survivors of their disease, and cure models can be a useful tool to analyze and describe cancer survival data. The goal of this article is to review what a cure model is, explain when cure models can be used, and use cure models to describe multiple myeloma survival trends. Multiple myeloma is generally considered an incurable disease, and this article shows that by using cure models, rather than the standard Cox proportional hazards model, we can evaluate whether there is evidence that therapies at the University of Arkansas for Medical Sciences induce a proportion of patients to be long-term survivors. PMID:22675175

  17. A neural network model of metaphor understanding with dynamic interaction based on a statistical language analysis: targeting a human-like model.

    PubMed

    Terai, Asuka; Nakagawa, Masanori

    2007-08-01

    The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.

  18. Numerical and Qualitative Contrasts of Two Statistical Models ...

    EPA Pesticide Factsheets

    Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-

  19. Generalized memory associativity in a network model for the neuroses

    NASA Astrophysics Data System (ADS)

    Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.

    2009-03-01

    We review concepts introduced in earlier work, where a neural network mechanism describes some mental processes in neurotic pathology and psychoanalytic working-through, as associative memory functioning, according to the findings of Freud. We developed a complex network model, where modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's idea that consciousness is related to symbolic and linguistic memory activity in the brain. We have introduced a generalization of the Boltzmann machine to model memory associativity. Model behavior is illustrated with simulations and some of its properties are analyzed with methods from statistical mechanics.

  20. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  1. The development of ensemble theory. A new glimpse at the history of statistical mechanics

    NASA Astrophysics Data System (ADS)

    Inaba, Hajime

    2015-12-01

    This paper investigates the history of statistical mechanics from the viewpoint of the development of the ensemble theory from 1871 to 1902. In 1871, Ludwig Boltzmann introduced a prototype model of an ensemble that represents a polyatomic gas. In 1879, James Clerk Maxwell defined an ensemble as copies of systems of the same energy. Inspired by H.W. Watson, he called his approach "statistical". Boltzmann and Maxwell regarded the ensemble theory as a much more general approach than the kinetic theory. In the 1880s, influenced by Hermann von Helmholtz, Boltzmann made use of ensembles to establish thermodynamic relations. In Elementary Principles in Statistical Mechanics of 1902, Josiah Willard Gibbs tried to get his ensemble theory to mirror thermodynamics, including thermodynamic operations in its scope. Thermodynamics played the role of a "blind guide". His theory of ensembles can be characterized as more mathematically oriented than Einstein's theory proposed in the same year. Mechanical, empirical, and statistical approaches to foundations of statistical mechanics are presented. Although it was formulated in classical terms, the ensemble theory provided an infrastructure still valuable in quantum statistics because of its generality.

  2. The propagation of inventory-based positional errors into statistical landslide susceptibility models

    NASA Astrophysics Data System (ADS)

    Steger, Stefan; Brenning, Alexander; Bell, Rainer; Glade, Thomas

    2016-12-01

    There is unanimous agreement that a precise spatial representation of past landslide occurrences is a prerequisite to produce high quality statistical landslide susceptibility models. Even though perfectly accurate landslide inventories rarely exist, investigations of how landslide inventory-based errors propagate into subsequent statistical landslide susceptibility models are scarce. The main objective of this research was to systematically examine whether and how inventory-based positional inaccuracies of different magnitudes influence modelled relationships, validation results, variable importance and the visual appearance of landslide susceptibility maps. The study was conducted for a landslide-prone site located in the districts of Amstetten and Waidhofen an der Ybbs, eastern Austria, where an earth-slide point inventory was available. The methodological approach comprised an artificial introduction of inventory-based positional errors into the present landslide data set and an in-depth evaluation of subsequent modelling results. Positional errors were introduced by artificially changing the original landslide position by a mean distance of 5, 10, 20, 50 and 120 m. The resulting differently precise response variables were separately used to train logistic regression models. Odds ratios of predictor variables provided insights into modelled relationships. Cross-validation and spatial cross-validation enabled an assessment of predictive performances and permutation-based variable importance. All analyses were additionally carried out with synthetically generated data sets to further verify the findings under rather controlled conditions. The results revealed that an increasing positional inventory-based error was generally related to increasing distortions of modelling and validation results. However, the findings also highlighted that interdependencies between inventory-based spatial inaccuracies and statistical landslide susceptibility models are complex. The systematic comparisons of 12 models provided valuable evidence that the respective error-propagation was not only determined by the degree of positional inaccuracy inherent in the landslide data, but also by the spatial representation of landslides and the environment, landslide magnitude, the characteristics of the study area, the selected classification method and an interplay of predictors within multiple variable models. Based on the results, we deduced that a direct propagation of minor to moderate inventory-based positional errors into modelling results can be partly counteracted by adapting the modelling design (e.g. generalization of input data, opting for strongly generalizing classifiers). Since positional errors within landslide inventories are common and subsequent modelling and validation results are likely to be distorted, the potential existence of inventory-based positional inaccuracies should always be considered when assessing landslide susceptibility by means of empirical models.

  3. Initial Systematic Investigations of the Weakly Coupled Free Fermionic Heterotic String Landscape Statistics

    NASA Astrophysics Data System (ADS)

    Renner, Timothy

    2011-12-01

    A C++ framework was constructed with the explicit purpose of systematically generating string models using the Weakly Coupled Free Fermionic Heterotic String (WCFFHS) method. The software, optimized for speed, generality, and ease of use, has been used to conduct preliminary systematic investigations of WCFFHS vacua. Documentation for this framework is provided in the Appendix. After an introduction to theoretical and computational aspects of WCFFHS model building, a study of ten-dimensional WCFFHS models is presented. Degeneracies among equivalent expressions of each of the known models are investigated and classified. A study of more phenomenologically realistic four-dimensional models based on the well known "NAHE" set is then presented, with statistics being reported on gauge content, matter representations, and space-time supersymmetries. The final study is a parallel to the NAHE study in which a variation of the NAHE set is systematically extended and examined statistically. Special attention is paid to models with "mirroring"---identical observable and hidden sector gauge groups and matter representations.

  4. Generalized theory of semiflexible polymers.

    PubMed

    Wiggins, Paul A; Nelson, Philip C

    2006-03-01

    DNA bending on length scales shorter than a persistence length plays an integral role in the translation of genetic information from DNA to cellular function. Quantitative experimental studies of these biological systems have led to a renewed interest in the polymer mechanics relevant for describing the conformational free energy of DNA bending induced by protein-DNA complexes. Recent experimental results from DNA cyclization studies have cast doubt on the applicability of the canonical semiflexible polymer theory, the wormlike chain (WLC) model, to DNA bending on biologically relevant length scales. This paper develops a theory of the chain statistics of a class of generalized semiflexible polymer models. Our focus is on the theoretical development of these models and the calculation of experimental observables. To illustrate our methods, we focus on a specific, illustrative model of DNA bending. We show that the WLC model generically describes the long-length-scale chain statistics of semiflexible polymers, as predicted by renormalization group arguments. In particular, we show that either the WLC or our present model adequately describes force-extension, solution scattering, and long-contour-length cyclization experiments, regardless of the details of DNA bend elasticity. In contrast, experiments sensitive to short-length-scale chain behavior can in principle reveal dramatic departures from the linear elastic behavior assumed in the WLC model. We demonstrate this explicitly by showing that our toy model can reproduce the anomalously large short-contour-length cyclization factors recently measured by Cloutier and Widom. Finally, we discuss the applicability of these models to DNA chain statistics in the context of future experiments.

  5. Perturbation Selection and Local Influence Analysis for Nonlinear Structural Equation Model

    ERIC Educational Resources Information Center

    Chen, Fei; Zhu, Hong-Tu; Lee, Sik-Yum

    2009-01-01

    Local influence analysis is an important statistical method for studying the sensitivity of a proposed model to model inputs. One of its important issues is related to the appropriate choice of a perturbation vector. In this paper, we develop a general method to select an appropriate perturbation vector and a second-order local influence measure…

  6. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    ERIC Educational Resources Information Center

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

  7. Designing a Qualitative Data Collection Strategy (QDCS) for Africa - Phase 1: A Gap Analysis of Existing Models, Simulations, and Tools Relating to Africa

    DTIC Science & Technology

    2012-06-01

    generalized behavioral model characterized after the fictional Seldon equations (the one elaborated upon by Isaac Asimov in the 1951 novel, The...Foundation). Asimov described the Seldon equations as essentially statistical models with historical data of a sufficient size and variability that they

  8. Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model

    ERIC Educational Resources Information Center

    Andrich, David; Humphry, Stephen M.; Marais, Ida

    2012-01-01

    Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…

  9. Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

    ERIC Educational Resources Information Center

    Wells, Craig S.; Bolt, Daniel M.

    2008-01-01

    Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

  10. EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 3. PROGRAM USER'S GUIDE

    EPA Science Inventory

    The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four volumes. Moreover, the tests are generally applicable to other model evaluation problems. Volu...

  11. EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 4. EVALUATION GUIDE

    EPA Science Inventory

    The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four volumes. Moreover, the tests are generally applicable to other model evaluation problems. Volu...

  12. Modeling Longitudinal Data with Generalized Additive Models: Applications to Single-Case Designs

    ERIC Educational Resources Information Center

    Sullivan, Kristynn J.; Shadish, William R.

    2013-01-01

    Single case designs (SCDs) are short time series that assess intervention effects by measuring units repeatedly over time both in the presence and absence of treatment. For a variety of reasons, interest in the statistical analysis and meta-analysis of these designs has been growing in recent years. This paper proposes modeling SCD data with…

  13. Theory-based Bayesian models of inductive learning and reasoning.

    PubMed

    Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles

    2006-07-01

    Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.

  14. Reciprocity in directed networks

    NASA Astrophysics Data System (ADS)

    Yin, Mei; Zhu, Lingjiong

    2016-04-01

    Reciprocity is an important characteristic of directed networks and has been widely used in the modeling of World Wide Web, email, social, and other complex networks. In this paper, we take a statistical physics point of view and study the limiting entropy and free energy densities from the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble whose sufficient statistics are given by edge and reciprocal densities. The sparse case is also studied for the grand canonical ensemble. Extensions to more general reciprocal models including reciprocal triangle and star densities will likewise be discussed.

  15. Analysis models for the estimation of oceanic fields

    NASA Technical Reports Server (NTRS)

    Carter, E. F.; Robinson, A. R.

    1987-01-01

    A general model for statistically optimal estimates is presented for dealing with scalar, vector and multivariate datasets. The method deals with anisotropic fields and treats space and time dependence equivalently. Problems addressed include the analysis, or the production of synoptic time series of regularly gridded fields from irregular and gappy datasets, and the estimate of fields by compositing observations from several different instruments and sampling schemes. Technical issues are discussed, including the convergence of statistical estimates, the choice of representation of the correlations, the influential domain of an observation, and the efficiency of numerical computations.

  16. On a Quantum Model of Brain Activities

    NASA Astrophysics Data System (ADS)

    Fichtner, K.-H.; Fichtner, L.; Freudenberg, W.; Ohya, M.

    2010-01-01

    One of the main activities of the brain is the recognition of signals. A first attempt to explain the process of recognition in terms of quantum statistics was given in [6]. Subsequently, details of the mathematical model were presented in a (still incomplete) series of papers (cf. [7, 2, 5, 10]). In the present note we want to give a general view of the principal ideas of this approach. We will introduce the basic spaces and justify the choice of spaces and operations. Further, we bring the model face to face with basic postulates any statistical model of the recognition process should fulfill. These postulates are in accordance with the opinion widely accepted in psychology and neurology.

  17. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    PubMed

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

  18. Assessing the sensitivity and robustness of prediction models for apple firmness using spectral scattering technique

    USDA-ARS?s Scientific Manuscript database

    Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...

  19. Fractional models of seismoacoustic and electromagnetic activity

    NASA Astrophysics Data System (ADS)

    Shevtsov, Boris; Sheremetyeva, Olga

    2017-10-01

    Statistical models of the seismoacoustic and electromagnetic activity caused by deformation disturbances are considered on the basis of compound Poisson process and its fractional generalizations. Wave representations of these processes are used too. It is discussed five regimes of deformation activity and their role in understanding of the earthquakes precursors nature.

  20. Why environmental scientists are becoming Bayesians

    Treesearch

    James S. Clark

    2005-01-01

    Advances in computational statistics provide a general framework for the high dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of...

  1. INTERANNUAL VARIATION IN METEOROLOGICALLY ADJUSTED OZONE LEVELS IN THE EASTERN UNITED STATES: A COMPARISON OF TWO APPROACHED

    EPA Science Inventory

    Assessing the influence of abatement efforts and other human activities on ozone levels is complicated by the atmosphere's changeable nature. Two statistical methods, the dynamic linear model(DLM) and the generalized additive model (GAM), are used to estimate ozone trends in the...

  2. Generalizing Experimental Findings

    DTIC Science & Technology

    2015-06-01

    ity.” In graphical terms, these assumptions may require several d-separation tests on several sub-graphs. It is utterly unimaginable therefore that...Education) (a) (Salary)(Education) (Skill) (b) S ( Test ) YX ZZ YX Figure 1: (a) A transportability model in which a post-treatment variable Z is S-admissible...observational studies to estimate population treatment ef- fects. Journal Royal Statistical Society: Series A (Statistics in Society) Forthcoming, doi

  3. An Item Fit Statistic Based on Pseudocounts from the Generalized Graded Unfolding Model: A Preliminary Report.

    ERIC Educational Resources Information Center

    Roberts, James S.

    Stone and colleagues (C. Stone, R. Ankenman, S. Lane, and M. Liu, 1993; C. Stone, R. Mislevy and J. Mazzeo, 1994; C. Stone, 2000) have proposed a fit index that explicitly accounts for the measurement error inherent in an estimated theta value, here called chi squared superscript 2, subscript i*. The elements of this statistic are natural…

  4. Ecological statistics of Gestalt laws for the perceptual organization of contours.

    PubMed

    Elder, James H; Goldberg, Richard M

    2002-01-01

    Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.

  5. TRACX2: a connectionist autoencoder using graded chunks to model infant visual statistical learning.

    PubMed

    Mareschal, Denis; French, Robert M

    2017-01-05

    Even newborn infants are able to extract structure from a stream of sensory inputs; yet how this is achieved remains largely a mystery. We present a connectionist autoencoder model, TRACX2, that learns to extract sequence structure by gradually constructing chunks, storing these chunks in a distributed manner across its synaptic weights and recognizing these chunks when they re-occur in the input stream. Chunks are graded rather than all-or-nothing in nature. As chunks are learnt their component parts become more and more tightly bound together. TRACX2 successfully models the data from five experiments from the infant visual statistical learning literature, including tasks involving forward and backward transitional probabilities, low-salience embedded chunk items, part-sequences and illusory items. The model also captures performance differences across ages through the tuning of a single-learning rate parameter. These results suggest that infant statistical learning is underpinned by the same domain-general learning mechanism that operates in auditory statistical learning and, potentially, in adult artificial grammar learning.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).

  6. TRACX2: a connectionist autoencoder using graded chunks to model infant visual statistical learning

    PubMed Central

    French, Robert M.

    2017-01-01

    Even newborn infants are able to extract structure from a stream of sensory inputs; yet how this is achieved remains largely a mystery. We present a connectionist autoencoder model, TRACX2, that learns to extract sequence structure by gradually constructing chunks, storing these chunks in a distributed manner across its synaptic weights and recognizing these chunks when they re-occur in the input stream. Chunks are graded rather than all-or-nothing in nature. As chunks are learnt their component parts become more and more tightly bound together. TRACX2 successfully models the data from five experiments from the infant visual statistical learning literature, including tasks involving forward and backward transitional probabilities, low-salience embedded chunk items, part-sequences and illusory items. The model also captures performance differences across ages through the tuning of a single-learning rate parameter. These results suggest that infant statistical learning is underpinned by the same domain-general learning mechanism that operates in auditory statistical learning and, potentially, in adult artificial grammar learning. This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences’. PMID:27872375

  7. Statistical parameters of random heterogeneity estimated by analysing coda waves based on finite difference method

    NASA Astrophysics Data System (ADS)

    Emoto, K.; Saito, T.; Shiomi, K.

    2017-12-01

    Short-period (<1 s) seismograms are strongly affected by small-scale (<10 km) heterogeneities in the lithosphere. In general, short-period seismograms are analysed based on the statistical method by considering the interaction between seismic waves and randomly distributed small-scale heterogeneities. Statistical properties of the random heterogeneities have been estimated by analysing short-period seismograms. However, generally, the small-scale random heterogeneity is not taken into account for the modelling of long-period (>2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we estimate the intrinsic attenuation by modelling the decay rate of the energy. The method proposed in this study is suitable for quantifying the statistical properties of long-wavelength subsurface random inhomogeneity, which leads the way to characterizing a wider wavenumber range of spectra, including the corner wavenumber.

  8. Quantum theory of multiscale coarse-graining.

    PubMed

    Han, Yining; Jin, Jaehyeok; Wagner, Jacob W; Voth, Gregory A

    2018-03-14

    Coarse-grained (CG) models serve as a powerful tool to simulate molecular systems at much longer temporal and spatial scales. Previously, CG models and methods have been built upon classical statistical mechanics. The present paper develops a theory and numerical methodology for coarse-graining in quantum statistical mechanics, by generalizing the multiscale coarse-graining (MS-CG) method to quantum Boltzmann statistics. A rigorous derivation of the sufficient thermodynamic consistency condition is first presented via imaginary time Feynman path integrals. It identifies the optimal choice of CG action functional and effective quantum CG (qCG) force field to generate a quantum MS-CG (qMS-CG) description of the equilibrium system that is consistent with the quantum fine-grained model projected onto the CG variables. A variational principle then provides a class of algorithms for optimally approximating the qMS-CG force fields. Specifically, a variational method based on force matching, which was also adopted in the classical MS-CG theory, is generalized to quantum Boltzmann statistics. The qMS-CG numerical algorithms and practical issues in implementing this variational minimization procedure are also discussed. Then, two numerical examples are presented to demonstrate the method. Finally, as an alternative strategy, a quasi-classical approximation for the thermal density matrix expressed in the CG variables is derived. This approach provides an interesting physical picture for coarse-graining in quantum Boltzmann statistical mechanics in which the consistency with the quantum particle delocalization is obviously manifest, and it opens up an avenue for using path integral centroid-based effective classical force fields in a coarse-graining methodology.

  9. Directional Statistics for Polarization Observations of Individual Pulses from Radio Pulsars

    NASA Astrophysics Data System (ADS)

    McKinnon, M. M.

    2010-10-01

    Radio polarimetry is a three-dimensional statistical problem. The three-dimensional aspect of the problem arises from the Stokes parameters Q, U, and V, which completely describe the polarization of electromagnetic radiation and conceptually define the orientation of a polarization vector in the Poincaré sphere. The statistical aspect of the problem arises from the random fluctuations in the source-intrinsic polarization and the instrumental noise. A simple model for the polarization of pulsar radio emission has been used to derive the three-dimensional statistics of radio polarimetry. The model is based upon the proposition that the observed polarization is due to the incoherent superposition of two, highly polarized, orthogonal modes. The directional statistics derived from the model follow the Bingham-Mardia and Fisher family of distributions. The model assumptions are supported by the qualitative agreement between the statistics derived from it and those measured with polarization observations of the individual pulses from pulsars. The orthogonal modes are thought to be the natural modes of radio wave propagation in the pulsar magnetosphere. The intensities of the modes become statistically independent when generalized Faraday rotation (GFR) in the magnetosphere causes the difference in their phases to be large. A stochastic version of GFR occurs when fluctuations in the phase difference are also large, and may be responsible for the more complicated polarization patterns observed in pulsar radio emission.

  10. Statistics for X-chromosome associations.

    PubMed

    Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

    2018-06-13

    In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.

  11. An operational GLS model for hydrologic regression

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1989-01-01

    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  12. Mathematic model analysis of Gaussian beam propagation through an arbitrary thickness random phase screen.

    PubMed

    Tian, Yuzhen; Guo, Jin; Wang, Rui; Wang, Tingfeng

    2011-09-12

    In order to research the statistical properties of Gaussian beam propagation through an arbitrary thickness random phase screen for adaptive optics and laser communication application in the laboratory, we establish mathematic models of statistical quantities, which are based on the Rytov method and the thin phase screen model, involved in the propagation process. And the analytic results are developed for an arbitrary thickness phase screen based on the Kolmogorov power spectrum. The comparison between the arbitrary thickness phase screen and the thin phase screen shows that it is more suitable for our results to describe the generalized case, especially the scintillation index.

  13. Transfer Entropy as a Log-Likelihood Ratio

    NASA Astrophysics Data System (ADS)

    Barnett, Lionel; Bossomaier, Terry

    2012-09-01

    Transfer entropy, an information-theoretic measure of time-directed information transfer between joint processes, has steadily gained popularity in the analysis of complex stochastic dynamics in diverse fields, including the neurosciences, ecology, climatology, and econometrics. We show that for a broad class of predictive models, the log-likelihood ratio test statistic for the null hypothesis of zero transfer entropy is a consistent estimator for the transfer entropy itself. For finite Markov chains, furthermore, no explicit model is required. In the general case, an asymptotic χ2 distribution is established for the transfer entropy estimator. The result generalizes the equivalence in the Gaussian case of transfer entropy and Granger causality, a statistical notion of causal influence based on prediction via vector autoregression, and establishes a fundamental connection between directed information transfer and causality in the Wiener-Granger sense.

  14. Transfer entropy as a log-likelihood ratio.

    PubMed

    Barnett, Lionel; Bossomaier, Terry

    2012-09-28

    Transfer entropy, an information-theoretic measure of time-directed information transfer between joint processes, has steadily gained popularity in the analysis of complex stochastic dynamics in diverse fields, including the neurosciences, ecology, climatology, and econometrics. We show that for a broad class of predictive models, the log-likelihood ratio test statistic for the null hypothesis of zero transfer entropy is a consistent estimator for the transfer entropy itself. For finite Markov chains, furthermore, no explicit model is required. In the general case, an asymptotic χ2 distribution is established for the transfer entropy estimator. The result generalizes the equivalence in the Gaussian case of transfer entropy and Granger causality, a statistical notion of causal influence based on prediction via vector autoregression, and establishes a fundamental connection between directed information transfer and causality in the Wiener-Granger sense.

  15. Quantification of model uncertainty in aerosol optical thickness retrieval from Ozone Monitoring Instrument (OMI) measurements

    NASA Astrophysics Data System (ADS)

    Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. P.

    2013-09-01

    We study uncertainty quantification in remote sensing of aerosols in the atmosphere with top of the atmosphere reflectance measurements from the nadir-viewing Ozone Monitoring Instrument (OMI). Focus is on the uncertainty in aerosol model selection of pre-calculated aerosol models and on the statistical modelling of the model inadequacies. The aim is to apply statistical methodologies that improve the uncertainty estimates of the aerosol optical thickness (AOT) retrieval by propagating model selection and model error related uncertainties more realistically. We utilise Bayesian model selection and model averaging methods for the model selection problem and use Gaussian processes to model the smooth systematic discrepancies from the modelled to observed reflectance. The systematic model error is learned from an ensemble of operational retrievals. The operational OMI multi-wavelength aerosol retrieval algorithm OMAERO is used for cloud free, over land pixels of the OMI instrument with the additional Bayesian model selection and model discrepancy techniques. The method is demonstrated with four examples with different aerosol properties: weakly absorbing aerosols, forest fires over Greece and Russia, and Sahara dessert dust. The presented statistical methodology is general; it is not restricted to this particular satellite retrieval application.

  16. Statistical dielectronic recombination rates for multielectron ions in plasma

    NASA Astrophysics Data System (ADS)

    Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.

    2017-10-01

    We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.

  17. Frequency-selective fading statistics of shallow-water acoustic communication channel with a few multipaths

    NASA Astrophysics Data System (ADS)

    Bae, Minja; Park, Jihyun; Kim, Jongju; Xue, Dandan; Park, Kyu-Chil; Yoon, Jong Rak

    2016-07-01

    The bit error rate of an underwater acoustic communication system is related to multipath fading statistics, which determine the signal-to-noise ratio. The amplitude and delay of each path depend on sea surface roughness, propagation medium properties, and source-to-receiver range as a function of frequency. Therefore, received signals will show frequency-dependent fading. A shallow-water acoustic communication channel generally shows a few strong multipaths that interfere with each other and the resulting interference affects the fading statistics model. In this study, frequency-selective fading statistics are modeled on the basis of the phasor representation of the complex path amplitude. The fading statistics distribution is parameterized by the frequency-dependent constructive or destructive interference of multipaths. At a 16 m depth with a muddy bottom, a wave height of 0.2 m, and source-to-receiver ranges of 100 and 400 m, fading statistics tend to show a Rayleigh distribution at a destructive interference frequency, but a Rice distribution at a constructive interference frequency. The theoretical fading statistics well matched the experimental ones.

  18. General purpose simulation system of the data management system for Space Shuttle mission 18

    NASA Technical Reports Server (NTRS)

    Bengtson, N. M.; Mellichamp, J. M.; Smith, O. C.

    1976-01-01

    A simulation program for the flow of data through the Data Management System of Spacelab and Space Shuttle was presented. The science, engineering, command and guidance, navigation and control data were included. The programming language used was General Purpose Simulation System V (OS). The science and engineering data flow was modeled from its origin at the experiments and subsystems to transmission from Space Shuttle. Command data flow was modeled from the point of reception onboard and from the CDMS Control Panel to the experiments and subsystems. The GN&C data flow model handled data between the General Purpose Computer and the experiments and subsystems. Mission 18 was the particular flight chosen for simulation. The general structure of the program is presented, followed by a user's manual. Input data required to make runs are discussed followed by identification of the output statistics. The appendices contain a detailed model configuration, program listing and results.

  19. An Assessment of Phylogenetic Tools for Analyzing the Interplay Between Interspecific Interactions and Phenotypic Evolution.

    PubMed

    Drury, J P; Grether, G F; Garland, T; Morlon, H

    2018-05-01

    Much ecological and evolutionary theory predicts that interspecific interactions often drive phenotypic diversification and that species phenotypes in turn influence species interactions. Several phylogenetic comparative methods have been developed to assess the importance of such processes in nature; however, the statistical properties of these methods have gone largely untested. Focusing mainly on scenarios of competition between closely-related species, we assess the performance of available comparative approaches for analyzing the interplay between interspecific interactions and species phenotypes. We find that many currently used statistical methods often fail to detect the impact of interspecific interactions on trait evolution, that sister-taxa analyses are particularly unreliable in general, and that recently developed process-based models have more satisfactory statistical properties. Methods for detecting predictors of species interactions are generally more reliable than methods for detecting character displacement. In weighing the strengths and weaknesses of different approaches, we hope to provide a clear guide for empiricists testing hypotheses about the reciprocal effect of interspecific interactions and species phenotypes and to inspire further development of process-based models.

  20. Quantum description of light propagation in generalized media

    NASA Astrophysics Data System (ADS)

    Häyrynen, Teppo; Oksanen, Jani

    2016-02-01

    Linear quantum input-output relation based models are widely applied to describe the light propagation in a lossy medium. The details of the interaction and the associated added noise depend on whether the device is configured to operate as an amplifier or an attenuator. Using the traveling wave (TW) approach, we generalize the linear material model to simultaneously account for both the emission and absorption processes and to have point-wise defined noise field statistics and intensity dependent interaction strengths. Thus, our approach describes the quantum input-output relations of linear media with net attenuation, amplification or transparency without pre-selection of the operation point. The TW approach is then applied to investigate materials at thermal equilibrium, inverted materials, the transparency limit where losses are compensated, and the saturating amplifiers. We also apply the approach to investigate media in nonuniform states which can be e.g. consequences of a temperature gradient over the medium or a position dependent inversion of the amplifier. Furthermore, by using the generalized model we investigate devices with intensity dependent interactions and show how an initial thermal field transforms to a field having coherent statistics due to gain saturation.

  1. A person based formula for allocating commissioning funds to general practices in England: development of a statistical model

    PubMed Central

    Smith, Peter; Gravelle, Hugh; Martin, Steve; Bardsley, Martin; Rice, Nigel; Georghiou, Theo; Dusheiko, Mark; Billings, John; Lorenzo, Michael De; Sanderson, Colin

    2011-01-01

    Objectives To develop a formula for allocating resources for commissioning hospital care to all general practices in England based on the health needs of the people registered in each practice Design Multivariate prospective statistical models were developed in which routinely collected electronic information from 2005-6 and 2006-7 on individuals and the areas in which they lived was used to predict their costs of hospital care in the next year, 2007-8. Data on individuals included all diagnoses recorded at any inpatient admission. Models were developed on a random sample of 5 million people and validated on a second random sample of 5 million people and a third sample of 5 million people drawn from a random sample of practices. Setting All general practices in England as of 1 April 2007. All NHS inpatient admissions and outpatient attendances for individuals registered with a general practice on that date. Subjects All individuals registered with a general practice in England at 1 April 2007. Main outcome measures Power of the statistical models to predict the costs of the individual patient or each practice’s registered population for 2007-8 tested with a range of metrics (R2 reported here). Comparisons of predicted costs in 2007-8 with actual costs incurred in the same year were calculated by individual and by practice. Results Models including person level information (age, sex, and ICD-10 codes diagnostic recorded) and a range of area level information (such as socioeconomic deprivation and supply of health facilities) were most predictive of costs. After accounting for person level variables, area level variables added little explanatory power. The best models for resource allocation could predict upwards of 77% of the variation in costs at practice level, and about 12% at the person level. With these models, the predicted costs of about a third of practices would exceed or undershoot the actual costs by 10% or more. Smaller practices were more likely to be in these groups. Conclusions A model was developed that performed well by international standards, and could be used for allocations to practices for commissioning. The best formulas, however, could predict only about 12% of the variation in next year’s costs of most inpatient and outpatient NHS care for each individual. Person-based diagnostic data significantly added to the predictive power of the models. PMID:22110252

  2. Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.

    PubMed

    Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret

    2005-01-01

    Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.

  3. Evaluating statistical consistency in the ocean model component of the Community Earth System Model (pyCECT v2.0)

    NASA Astrophysics Data System (ADS)

    Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen

    2016-07-01

    The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.

  4. Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.

    PubMed

    Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido

    2013-04-01

    Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.

  5. A General Model for Testing Mediation and Moderation Effects

    PubMed Central

    MacKinnon, David P.

    2010-01-01

    This paper describes methods for testing mediation and moderation effects in a dataset, both together and separately. Investigations of this kind are especially valuable in prevention research to obtain information on the process by which a program achieves its effects and whether the program is effective for subgroups of individuals. A general model that simultaneously estimates mediation and moderation effects is presented, and the utility of combining the effects into a single model is described. Possible effects of interest in the model are explained, as are statistical methods to assess these effects. The methods are further illustrated in a hypothetical prevention program example. PMID:19003535

  6. Analyzing Dyadic Sequence Data—Research Questions and Implied Statistical Models

    PubMed Central

    Fuchs, Peter; Nussbeck, Fridtjof W.; Meuwly, Nathalie; Bodenmann, Guy

    2017-01-01

    The analysis of observational data is often seen as a key approach to understanding dynamics in romantic relationships but also in dyadic systems in general. Statistical models for the analysis of dyadic observational data are not commonly known or applied. In this contribution, selected approaches to dyadic sequence data will be presented with a focus on models that can be applied when sample sizes are of medium size (N = 100 couples or less). Each of the statistical models is motivated by an underlying potential research question, the most important model results are presented and linked to the research question. The following research questions and models are compared with respect to their applicability using a hands on approach: (I) Is there an association between a particular behavior by one and the reaction by the other partner? (Pearson Correlation); (II) Does the behavior of one member trigger an immediate reaction by the other? (aggregated logit models; multi-level approach; basic Markov model); (III) Is there an underlying dyadic process, which might account for the observed behavior? (hidden Markov model); and (IV) Are there latent groups of dyads, which might account for observing different reaction patterns? (mixture Markov; optimal matching). Finally, recommendations for researchers to choose among the different models, issues of data handling, and advises to apply the statistical models in empirical research properly are given (e.g., in a new r-package “DySeq”). PMID:28443037

  7. Building out a Measurement Model to Incorporate Complexities of Testing in the Language Domain

    ERIC Educational Resources Information Center

    Wilson, Mark; Moore, Stephen

    2011-01-01

    This paper provides a summary of a novel and integrated way to think about the item response models (most often used in measurement applications in social science areas such as psychology, education, and especially testing of various kinds) from the viewpoint of the statistical theory of generalized linear and nonlinear mixed models. In addition,…

  8. Statistical distributions of extreme dry spell in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Zin, Wan Zawiah Wan; Jemain, Abdul Aziz

    2010-11-01

    Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.

  9. Data-driven non-Markovian closure models

    NASA Astrophysics Data System (ADS)

    Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael

    2015-03-01

    This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up.

  10. Avalanches and generalized memory associativity in a network model for conscious and unconscious mental functioning

    NASA Astrophysics Data System (ADS)

    Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft

    2018-01-01

    We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.

  11. Drivers willingness to pay progressive rate for street parking.

    DOT National Transportation Integrated Search

    2015-01-01

    This study finds willingness to pay and price elasticity for on-street parking demand using stated : preference data obtained from 238 respondents. Descriptive, statistical and economic analyses including : regression, generalized linear model, and f...

  12. Assessment of Automated Measurement and Verification (M&V) Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Granderson, Jessica; Touzani, Samir; Custodio, Claudine

    This report documents the application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-building energy savings.

  13. Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

    NASA Astrophysics Data System (ADS)

    Kasibhatla, P.

    2004-12-01

    In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.

  14. Scale Dependence of Statistics of Spatially Averaged Rain Rate Seen in TOGA COARE Comparison with Predictions from a Stochastic Model

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, T. L.; Lau, William K. M. (Technical Monitor)

    2002-01-01

    A characteristic feature of rainfall statistics is that they in general depend on the space and time scales over which rain data are averaged. As a part of an earlier effort to determine the sampling error of satellite rain averages, a space-time model of rainfall statistics was developed to describe the statistics of gridded rain observed in GATE. The model allows one to compute the second moment statistics of space- and time-averaged rain rate which can be fitted to satellite or rain gauge data to determine the four model parameters appearing in the precipitation spectrum - an overall strength parameter, a characteristic length separating the long and short wavelength regimes and a characteristic relaxation time for decay of the autocorrelation of the instantaneous local rain rate and a certain 'fractal' power law exponent. For area-averaged instantaneous rain rate, this exponent governs the power law dependence of these statistics on the averaging length scale $L$ predicted by the model in the limit of small $L$. In particular, the variance of rain rate averaged over an $L \\times L$ area exhibits a power law singularity as $L \\rightarrow 0$. In the present work the model is used to investigate how the statistics of area-averaged rain rate over the tropical Western Pacific measured with ship borne radar during TOGA COARE (Tropical Ocean Global Atmosphere Coupled Ocean Atmospheric Response Experiment) and gridded on a 2 km grid depends on the size of the spatial averaging scale. Good agreement is found between the data and predictions from the model over a wide range of averaging length scales.

  15. General Aviation Avionics Statistics : 1975

    DOT National Transportation Integrated Search

    1978-06-01

    This report presents avionics statistics for the 1975 general aviation (GA) aircraft fleet and updates a previous publication, General Aviation Avionics Statistics: 1974. The statistics are presented in a capability group framework which enables one ...

  16. Economic Statistical Design of Integrated X-bar-S Control Chart with Preventive Maintenance and General Failure Distribution

    PubMed Central

    Caballero Morales, Santiago Omar

    2013-01-01

    The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC. PMID:23527082

  17. Sex-specific developmental models for Creophilus maxillosus (L.) (Coleoptera: Staphylinidae): searching for larger accuracy of insect age estimates.

    PubMed

    Frątczak-Łagiewska, Katarzyna; Matuszewski, Szymon

    2018-05-01

    Differences in size between males and females, called the sexual size dimorphism, are common in insects. These differences may be followed by differences in the duration of development. Accordingly, it is believed that insect sex may be used to increase the accuracy of insect age estimates in forensic entomology. Here, the sex-specific differences in the development of Creophilus maxillosus were studied at seven constant temperatures. We have also created separate developmental models for males and females of C. maxillosus and tested them in a validation study to answer a question whether sex-specific developmental models improve the accuracy of insect age estimates. Results demonstrate that males of C. maxillosus developed significantly longer than females. The sex-specific and general models for the total immature development had the same optimal temperature range and similar developmental threshold but different thermal constant K, which was the largest in the case of the male-specific model and the smallest in the case of the female-specific model. Despite these differences, validation study revealed just minimal and statistically insignificant differences in the accuracy of age estimates using sex-specific and general thermal summation models. This finding indicates that in spite of statistically significant differences in the duration of immature development between females and males of C. maxillosus, there is no increase in the accuracy of insect age estimates while using the sex-specific thermal summation models compared to the general model. Accordingly, this study does not support the use of sex-specific developmental data for the estimation of insect age in forensic entomology.

  18. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  19. The Role of Simulation Approaches in Statistics

    ERIC Educational Resources Information Center

    Wood, Michael

    2005-01-01

    This article explores the uses of a simulation model (the two bucket story)--implemented by a stand-alone computer program, or an Excel workbook (both on the web)--that can be used for deriving bootstrap confidence intervals, and simulating various probability distributions. The strengths of the model are its generality, the fact that it provides…

  20. Conceptualizations of Personality Disorders with the Five Factor Model-Count and Empathy Traits

    ERIC Educational Resources Information Center

    Kajonius, Petri J.; Dåderman, Anna M.

    2017-01-01

    Previous research has long advocated that emotional and behavioral disorders are related to general personality traits, such as the Five Factor Model (FFM). The addition of section III in the latest "Diagnostic and Statistical Manual of Mental Disorders" (DSM) recommends that extremity in personality traits together with maladaptive…

  1. The Covering Law Model in Communication Inquiry.

    ERIC Educational Resources Information Center

    Berger, Charles R.

    The first section of this paper defines covering law explanation as a theory which maintains that explanation may be achieved, and may be achieved, by subsuming what is to be explained under a general law. The model is examined in light of the deductive-nomological explanation, the deductive-statistical explanation, and the inductive-statistical…

  2. Statistical shear lag model - unraveling the size effect in hierarchical composites.

    PubMed

    Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D

    2015-05-01

    Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  3. Statistical downscaling of GCM simulations to streamflow using relevance vector machine

    NASA Astrophysics Data System (ADS)

    Ghosh, Subimal; Mujumdar, P. P.

    2008-01-01

    General circulation models (GCMs), the climate models often used in assessing the impact of climate change, operate on a coarse scale and thus the simulation results obtained from GCMs are not particularly useful in a comparatively smaller river basin scale hydrology. The article presents a methodology of statistical downscaling based on sparse Bayesian learning and Relevance Vector Machine (RVM) to model streamflow at river basin scale for monsoon period (June, July, August, September) using GCM simulated climatic variables. NCEP/NCAR reanalysis data have been used for training the model to establish a statistical relationship between streamflow and climatic variables. The relationship thus obtained is used to project the future streamflow from GCM simulations. The statistical methodology involves principal component analysis, fuzzy clustering and RVM. Different kernel functions are used for comparison purpose. The model is applied to Mahanadi river basin in India. The results obtained using RVM are compared with those of state-of-the-art Support Vector Machine (SVM) to present the advantages of RVMs over SVMs. A decreasing trend is observed for monsoon streamflow of Mahanadi due to high surface warming in future, with the CCSR/NIES GCM and B2 scenario.

  4. Generalized Models for Rock Joint Surface Shapes

    PubMed Central

    Du, Shigui; Hu, Yunjin; Hu, Xiaofei

    2014-01-01

    Generalized models of joint surface shapes are the foundation for mechanism studies on the mechanical effects of rock joint surface shapes. Based on extensive field investigations of rock joint surface shapes, generalized models for three level shapes named macroscopic outline, surface undulating shape, and microcosmic roughness were established through statistical analyses of 20,078 rock joint surface profiles. The relative amplitude of profile curves was used as a borderline for the division of different level shapes. The study results show that the macroscopic outline has three basic features such as planar, arc-shaped, and stepped; the surface undulating shape has three basic features such as planar, undulating, and stepped; and the microcosmic roughness has two basic features such as smooth and rough. PMID:25152901

  5. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  6. Using generalized additive (mixed) models to analyze single case designs.

    PubMed

    Shadish, William R; Zuur, Alain F; Sullivan, Kristynn J

    2014-04-01

    This article shows how to apply generalized additive models and generalized additive mixed models to single-case design data. These models excel at detecting the functional form between two variables (often called trend), that is, whether trend exists, and if it does, what its shape is (e.g., linear and nonlinear). In many respects, however, these models are also an ideal vehicle for analyzing single-case designs because they can consider level, trend, variability, overlap, immediacy of effect, and phase consistency that single-case design researchers examine when interpreting a functional relation. We show how these models can be implemented in a wide variety of ways to test whether treatment is effective, whether cases differ from each other, whether treatment effects vary over cases, and whether trend varies over cases. We illustrate diagnostic statistics and graphs, and we discuss overdispersion of data in detail, with examples of quasibinomial models for overdispersed data, including how to compute dispersion and quasi-AIC fit indices in generalized additive models. We show how generalized additive mixed models can be used to estimate autoregressive models and random effects and discuss the limitations of the mixed models compared to generalized additive models. We provide extensive annotated syntax for doing all these analyses in the free computer program R. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  7. A Non-Gaussian Stock Price Model: Options, Credit and a Multi-Timescale Memory

    NASA Astrophysics Data System (ADS)

    Borland, L.

    We review a recently proposed model of stock prices, based on astatistical feedback model that results in a non-Gaussian distribution of price changes. Applications to option pricing and the pricing of debt is discussed. A generalization to account for feedback effects over multiple timescales is also presented. This model reproduces most of the stylized facts (ie statistical anomalies) observed in real financial markets.

  8. General Blending Models for Data From Mixture Experiments

    PubMed Central

    Brown, L.; Donev, A. N.; Bissett, A. C.

    2015-01-01

    We propose a new class of models providing a powerful unification and extension of existing statistical methodology for analysis of data obtained in mixture experiments. These models, which integrate models proposed by Scheffé and Becker, extend considerably the range of mixture component effects that may be described. They become complex when the studied phenomenon requires it, but remain simple whenever possible. This article has supplementary material online. PMID:26681812

  9. Context-invariant quasi hidden variable (qHV) modelling of all joint von Neumann measurements for an arbitrary Hilbert space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loubenets, Elena R.

    We prove the existence for each Hilbert space of the two new quasi hidden variable (qHV) models, statistically noncontextual and context-invariant, reproducing all the von Neumann joint probabilities via non-negative values of real-valued measures and all the quantum product expectations—via the qHV (classical-like) average of the product of the corresponding random variables. In a context-invariant model, a quantum observable X can be represented by a variety of random variables satisfying the functional condition required in quantum foundations but each of these random variables equivalently models X under all joint von Neumann measurements, regardless of their contexts. The proved existence ofmore » this model negates the general opinion that, in terms of random variables, the Hilbert space description of all the joint von Neumann measurements for dimH≥3 can be reproduced only contextually. The existence of a statistically noncontextual qHV model, in particular, implies that every N-partite quantum state admits a local quasi hidden variable model introduced in Loubenets [J. Math. Phys. 53, 022201 (2012)]. The new results of the present paper point also to the generality of the quasi-classical probability model proposed in Loubenets [J. Phys. A: Math. Theor. 45, 185306 (2012)].« less

  10. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  11. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  12. General Aviation Avionics Statistics : 1976

    DOT National Transportation Integrated Search

    1979-11-01

    This report presents avionics statistics for the 1976 general aviation (GA) aircraft fleet and is the third in a series titled "General Aviation Avionics Statistics." The statistics are presented in a capability group framework which enables one to r...

  13. General Aviation Avionics Statistics : 1978 Data

    DOT National Transportation Integrated Search

    1980-12-01

    The report presents avionics statistics for the 1978 general aviation (GA) aircraft fleet and is the fifth in a series titled "General Aviation Statistics." The statistics are presented in a capability group framework which enables one to relate airb...

  14. General Aviation Avionics Statistics : 1979 Data

    DOT National Transportation Integrated Search

    1981-04-01

    This report presents avionics statistics for the 1979 general aviation (GA) aircraft fleet and is the sixth in a series titled General Aviation Avionics Statistics. The statistics preseneted in a capability group framework which enables one to relate...

  15. Statistical image reconstruction from correlated data with applications to PET

    PubMed Central

    Alessio, Adam; Sauer, Ken; Kinahan, Paul

    2008-01-01

    Most statistical reconstruction methods for emission tomography are designed for data modeled as conditionally independent Poisson variates. In reality, due to scanner detectors, electronics and data processing, correlations are introduced into the data resulting in dependent variates. In general, these correlations are ignored because they are difficult to measure and lead to computationally challenging statistical reconstruction algorithms. This work addresses the second concern, seeking to simplify the reconstruction of correlated data and provide a more precise image estimate than the conventional independent methods. In general, correlated variates have a large non-diagonal covariance matrix that is computationally challenging to use as a weighting term in a reconstruction algorithm. This work proposes two methods to simplify the use of a non-diagonal covariance matrix as the weighting term by (a) limiting the number of dimensions in which the correlations are modeled and (b) adopting flexible, yet computationally tractable, models for correlation structure. We apply and test these methods with simple simulated PET data and data processed with the Fourier rebinning algorithm which include the one-dimensional correlations in the axial direction and the two-dimensional correlations in the transaxial directions. The methods are incorporated into a penalized weighted least-squares 2D reconstruction and compared with a conventional maximum a posteriori approach. PMID:17921576

  16. Detecting temporal change in freshwater fisheries surveys: statistical power and the important linkages between management questions and monitoring objectives

    USGS Publications Warehouse

    Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,

    2016-01-01

    Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.

  17. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  18. 2-Point microstructure archetypes for improved elastic properties

    NASA Astrophysics Data System (ADS)

    Adams, Brent L.; Gao, Xiang

    2004-01-01

    Rectangular models of material microstructure are described by their 1- and 2-point (spatial) correlation statistics of placement of local state. In the procedure described here the local state space is described in discrete form; and the focus is on placement of local state within a finite number of cells comprising rectangular models. It is illustrated that effective elastic properties (generalized Hashin Shtrikman bounds) can be obtained that are linear in components of the correlation statistics. Within this framework the concept of an eigen-microstructure within the microstructure hull is useful. Given the practical innumerability of the microstructure hull, however, we introduce a method for generating a sequence of archetypes of eigen-microstructure, from the 2-point correlation statistics of local state, assuming that the 1-point statistics are stationary. The method is illustrated by obtaining an archetype for an imaginary two-phase material where the objective is to maximize the combination C_{xxxx}^{*} + C_{xyxy}^{*}

  19. Statistical downscaling for winter streamflow in Douro River

    NASA Astrophysics Data System (ADS)

    Jesús Esteban Parra, María; Hidalgo Muñoz, José Manuel; García-Valdecasas-Ojeda, Matilde; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda

    2015-04-01

    In this paper we have obtained climate change projections for winter flow of the Douro River in the period 2071-2100 by applying the technique of Partial Regression and various General Circulation Models of CMIP5. The streamflow data base used has been provided by the Center for Studies and Experimentation of Public Works, CEDEX. Series from gauing stations and reservoirs with less than 10% of missing data (filled by regression with well correlated neighboring stations) have been considered. The homogeneity of these series has been evaluated through the Pettit test and degree of human alteration by the Common Area Index. The application of these criteria led to the selection of 42 streamflow time series homogeneously distributed over the basin, covering the period 1951-2011. For these streamflow data, winter seasonal values were obtained by averaging the monthly values from January to March. Statistical downscaling models for the streamflow have been fitted using as predictors the main atmospheric modes of variability over the North Atlantic region. These modes have been obtained using winter sea level pressure data of the NCEP reanalysis, averaged for the months from December to February. Period 1951-1995 was used for calibration, while 1996-2011 period was used in validating the adjusted models. In general, these models are able to reproduce about 70% of the variability of the winter streamflow of the Douro River. Finally, the obtained statistical models have been applied to obtain projections for 2071-2100 period, using outputs from different CMIP5 models under the RPC8.5 scenario. The results for the end of the century show modest declines of winter streamflow in this river for most of the models. Keywords: Statistical downscaling, streamflow, Douro River, climate change. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).

  20. GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

    PubMed

    Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi

    2017-05-24

    As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs /CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

  1. The Influence of Vacuum Circuit Breakers and Different Motor Models on Switching Overvoltages in Motor Circuits

    NASA Astrophysics Data System (ADS)

    Wong, Cat S. M.; Snider, L. A.; Lo, Edward W. C.; Chung, T. S.

    Switching of induction motors with vacuum circuit breakers continues to be a concern. In this paper the influence on statistical overvoltages of the stochastic characteristics of vacuum circuit breakers, high frequency models of motors and transformers, and network characteristics, including cable lengths and network topology are evaluated and a general view of the overvoltages phenomena is presented. Finally, a real case study on the statistical voltage levels and risk-of-failure resulting from switching of a vacuum circuit breaker in an industrial installation in Hong Kong is presented.

  2. Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.

    PubMed

    Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

    2012-01-01

    Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.

  3. Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data

    PubMed Central

    Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

    2012-01-01

    Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786

  4. Statistical mapping of count survey data

    USGS Publications Warehouse

    Royle, J. Andrew; Link, W.A.; Sauer, J.R.; Scott, J. Michael; Heglund, Patricia J.; Morrison, Michael L.; Haufler, Jonathan B.; Wall, William A.

    2002-01-01

    We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to habitat structure. Although our models were fit in the absence of habitat information, the resulting predictions show a strong inverse relation with a map of forest cover in the state, as expected. Consequently, the results suggest that the correlated random effect in the model is broadly representing ecological variation, and that BBS data may be generally useful for studying bird-habitat relationships, even in the presence of observer errors and other widely recognized deficiencies of the BBS.

  5. An MDI Model and an Algorithm for Composite Hypotheses Testing and Estimation in Marketing

    DTIC Science & Technology

    1981-09-01

    Other, more general, developments in statistics and mathematical programming (duality) theories and methods are also briefly discussed for their possible bearing on further uses in marketing research and management. (Author)

  6. Massive parallelization of serial inference algorithms for a complex generalized linear model

    PubMed Central

    Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

    2014-01-01

    Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363

  7. Multi-region statistical shape model for cochlear implantation

    NASA Astrophysics Data System (ADS)

    Romera, Jordi; Kjer, H. Martin; Piella, Gemma; Ceresa, Mario; González Ballester, Miguel A.

    2016-03-01

    Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achieve the best results. The complexity of a proper global model increases even more when the amount of data available is limited to a small number of datasets. Typically, the anatomical variability between structures is associated to the variability of their physiological regions. In this paper, a complete pipeline is proposed for building a multi-region statistical shape model to study the entire variability from locally identified physiological regions of the inner ear. The proposed model, which is based on an extension of the Point Distribution Model (PDM), is built for a training set of 17 high-resolution images (24.5 μm voxels) of the inner ear. The model is evaluated according to its generalization ability and specificity. The results are compared with the ones of a global model built directly using the standard PDM approach. The evaluation results suggest that better accuracy can be achieved using a regional modeling of the inner ear.

  8. Modeling of the reactant conversion rate in a turbulent shear flow

    NASA Technical Reports Server (NTRS)

    Frankel, S. H.; Madnia, C. K.; Givi, P.

    1992-01-01

    Results are presented of direct numerical simulations (DNS) of spatially developing shear flows under the influence of infinitely fast chemical reactions of the type A + B yields Products. The simulation results are used to construct the compositional structure of the scalar field in a statistical manner. The results of this statistical analysis indicate that the use of a Beta density for the probability density function (PDF) of an appropriate Shvab-Zeldovich mixture fraction provides a very good estimate of the limiting bounds of the reactant conversion rate within the shear layer. This provides a strong justification for the implementation of this density in practical modeling of non-homogeneous turbulent reacting flows. However, the validity of the model cannot be generalized for predictions of higher order statistical quantities. A closed form analytical expression is presented for predicting the maximum rate of reactant conversion in non-homogeneous reacting turbulence.

  9. Statistical mechanics of soft-boson phase transitions

    NASA Technical Reports Server (NTRS)

    Gupta, Arun K.; Hill, Christopher T.; Holman, Richard; Kolb, Edward W.

    1991-01-01

    The existence of structure on large (100 Mpc) scales, and limits to anisotropies in the cosmic microwave background radiation (CMBR), have imperiled models of structure formation based solely upon the standard cold dark matter scenario. Novel scenarios, which may be compatible with large scale structure and small CMBR anisotropies, invoke nonlinear fluctuations in the density appearing after recombination, accomplished via the use of late time phase transitions involving ultralow mass scalar bosons. Herein, the statistical mechanics are studied of such phase transitions in several models involving naturally ultralow mass pseudo-Nambu-Goldstone bosons (pNGB's). These models can exhibit several interesting effects at high temperature, which is believed to be the most general possibilities for pNGB's.

  10. Is There a Critical Distance for Fickian Transport? - a Statistical Approach to Sub-Fickian Transport Modelling in Porous Media

    NASA Astrophysics Data System (ADS)

    Most, S.; Nowak, W.; Bijeljic, B.

    2014-12-01

    Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.

  11. Walking through the statistical black boxes of plant breeding.

    PubMed

    Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

    2016-10-01

    The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

  12. The microcomputer scientific software series 3: general linear model--analysis of variance.

    Treesearch

    Harold M. Rauscher

    1985-01-01

    A BASIC language set of programs, designed for use on microcomputers, is presented. This set of programs will perform the analysis of variance for any statistical model describing either balanced or unbalanced designs. The program computes and displays the degrees of freedom, Type I sum of squares, and the mean square for the overall model, the error, and each factor...

  13. Removing an intersubject variance component in a general linear model improves multiway factoring of event-related spectral perturbations in group EEG studies.

    PubMed

    Spence, Jeffrey S; Brier, Matthew R; Hart, John; Ferree, Thomas C

    2013-03-01

    Linear statistical models are used very effectively to assess task-related differences in EEG power spectral analyses. Mixed models, in particular, accommodate more than one variance component in a multisubject study, where many trials of each condition of interest are measured on each subject. Generally, intra- and intersubject variances are both important to determine correct standard errors for inference on functions of model parameters, but it is often assumed that intersubject variance is the most important consideration in a group study. In this article, we show that, under common assumptions, estimates of some functions of model parameters, including estimates of task-related differences, are properly tested relative to the intrasubject variance component only. A substantial gain in statistical power can arise from the proper separation of variance components when there is more than one source of variability. We first develop this result analytically, then show how it benefits a multiway factoring of spectral, spatial, and temporal components from EEG data acquired in a group of healthy subjects performing a well-studied response inhibition task. Copyright © 2011 Wiley Periodicals, Inc.

  14. Generalized functional linear models for gene-based case-control association studies.

    PubMed

    Fan, Ruzong; Wang, Yifan; Mills, James L; Carter, Tonia C; Lobach, Iryna; Wilson, Alexander F; Bailey-Wilson, Joan E; Weeks, Daniel E; Xiong, Momiao

    2014-11-01

    By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. © 2014 WILEY PERIODICALS, INC.

  15. Generalized Functional Linear Models for Gene-based Case-Control Association Studies

    PubMed Central

    Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao

    2014-01-01

    By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683

  16. Potential pitfalls when denoising resting state fMRI data using nuisance regression.

    PubMed

    Bright, Molly G; Tench, Christopher R; Murphy, Kevin

    2017-07-01

    In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  17. PET image reconstruction: a robust state space approach.

    PubMed

    Liu, Huafeng; Tian, Yi; Shi, Pengcheng

    2005-01-01

    Statistical iterative reconstruction algorithms have shown improved image quality over conventional nonstatistical methods in PET by using accurate system response models and measurement noise models. Strictly speaking, however, PET measurements, pre-corrected for accidental coincidences, are neither Poisson nor Gaussian distributed and thus do not meet basic assumptions of these algorithms. In addition, the difficulty in determining the proper system response model also greatly affects the quality of the reconstructed images. In this paper, we explore the usage of state space principles for the estimation of activity map in tomographic PET imaging. The proposed strategy formulates the organ activity distribution through tracer kinetics models, and the photon-counting measurements through observation equations, thus makes it possible to unify the dynamic reconstruction problem and static reconstruction problem into a general framework. Further, it coherently treats the uncertainties of the statistical model of the imaging system and the noisy nature of measurement data. Since H(infinity) filter seeks minimummaximum-error estimates without any assumptions on the system and data noise statistics, it is particular suited for PET image reconstruction where the statistical properties of measurement data and the system model are very complicated. The performance of the proposed framework is evaluated using Shepp-Logan simulated phantom data and real phantom data with favorable results.

  18. Statistical Systems with Z

    NASA Astrophysics Data System (ADS)

    William, Peter

    In this dissertation several two dimensional statistical systems exhibiting discrete Z(n) symmetries are studied. For this purpose a newly developed algorithm to compute the partition function of these models exactly is utilized. The zeros of the partition function are examined in order to obtain information about the observable quantities at the critical point. This occurs in the form of critical exponents of the order parameters which characterize phenomena at the critical point. The correlation length exponent is found to agree very well with those computed from strong coupling expansions for the mass gap and with Monte Carlo results. In Feynman's path integral formalism the partition function of a statistical system can be related to the vacuum expectation value of the time ordered product of the observable quantities of the corresponding field theoretic model. Hence a generalization of ordinary scale invariance in the form of conformal invariance is focussed upon. This principle is very suitably applicable, in the case of two dimensional statistical models undergoing second order phase transitions at criticality. The conformal anomaly specifies the universality class to which these models belong. From an evaluation of the partition function, the free energy at criticality is computed, to determine the conformal anomaly of these models. The conformal anomaly for all the models considered here are in good agreement with the predicted values.

  19. Application of statistical downscaling technique for the production of wine grapes (Vitis vinifera L.) in Spain

    NASA Astrophysics Data System (ADS)

    Gaitán Fernández, E.; García Moreno, R.; Pino Otín, M. R.; Ribalaygua Batalla, J.

    2012-04-01

    Climate and soil are two of the most important limiting factors for agricultural production. Nowadays climate change has been documented in many geographical locations affecting different cropping systems. The General Circulation Models (GCM) has become important tools to simulate the more relevant aspects of the climate expected for the XXI century in the frame of climatic change. These models are able to reproduce the general features of the atmospheric dynamic but their low resolution (about 200 Km) avoids a proper simulation of lower scale meteorological effects. Downscaling techniques allow overcoming this problem by adapting the model outcomes to local scale. In this context, FIC (Fundación para la Investigación del Clima) has developed a statistical downscaling technique based on a two step analogue methods. This methodology has been broadly tested on national and international environments leading to excellent results on future climate models. In a collaboration project, this statistical downscaling technique was applied to predict future scenarios for the grape growing systems in Spain. The application of such model is very important to predict expected climate for the different growing crops, mainly for grape, where the success of different varieties are highly related to climate and soil. The model allowed the implementation of agricultural conservation practices in the crop production, detecting highly sensible areas to negative impacts produced by any modification of climate in the different regions, mainly those protected with protected designation of origin, and the definition of new production areas with optimal edaphoclimatic conditions for the different varieties.

  20. Automatic liver segmentation in computed tomography using general-purpose shape modeling methods.

    PubMed

    Spinczyk, Dominik; Krasoń, Agata

    2018-05-29

    Liver segmentation in computed tomography is required in many clinical applications. The segmentation methods used can be classified according to a number of criteria. One important criterion for method selection is the shape representation of the segmented organ. The aim of the work is automatic liver segmentation using general purpose shape modeling methods. As part of the research, methods based on shape information at various levels of advancement were used. The single atlas based segmentation method was used as the simplest shape-based method. This method is derived from a single atlas using the deformable free-form deformation of the control point curves. Subsequently, the classic and modified Active Shape Model (ASM) was used, using medium body shape models. As the most advanced and main method generalized statistical shape models, Gaussian Process Morphable Models was used, which are based on multi-dimensional Gaussian distributions of the shape deformation field. Mutual information and sum os square distance were used as similarity measures. The poorest results were obtained for the single atlas method. For the ASM method in 10 analyzed cases for seven test images, the Dice coefficient was above 55[Formula: see text], of which for three of them the coefficient was over 70[Formula: see text], which placed the method in second place. The best results were obtained for the method of generalized statistical distribution of the deformation field. The DICE coefficient for this method was 88.5[Formula: see text] CONCLUSIONS: This value of 88.5 [Formula: see text] Dice coefficient can be explained by the use of general-purpose shape modeling methods with a large variance of the shape of the modeled object-the liver and limitations on the size of our training data set, which was limited to 10 cases. The obtained results in presented fully automatic method are comparable with dedicated methods for liver segmentation. In addition, the deforamtion features of the model can be modeled mathematically by using various kernel functions, which allows to segment the liver on a comparable level using a smaller learning set.

  1. Multiplicative point process as a model of trading activity

    NASA Astrophysics Data System (ADS)

    Gontis, V.; Kaulakys, B.

    2004-11-01

    Signals consisting of a sequence of pulses show that inherent origin of the 1/ f noise is a Brownian fluctuation of the average interevent time between subsequent pulses of the pulse sequence. In this paper, we generalize the model of interevent time to reproduce a variety of self-affine time series exhibiting power spectral density S( f) scaling as a power of the frequency f. Furthermore, we analyze the relation between the power-law correlations and the origin of the power-law probability distribution of the signal intensity. We introduce a stochastic multiplicative model for the time intervals between point events and analyze the statistical properties of the signal analytically and numerically. Such model system exhibits power-law spectral density S( f)∼1/ fβ for various values of β, including β= {1}/{2}, 1 and {3}/{2}. Explicit expressions for the power spectra in the low-frequency limit and for the distribution density of the interevent time are obtained. The counting statistics of the events is analyzed analytically and numerically, as well. The specific interest of our analysis is related with the financial markets, where long-range correlations of price fluctuations largely depend on the number of transactions. We analyze the spectral density and counting statistics of the number of transactions. The model reproduces spectral properties of the real markets and explains the mechanism of power-law distribution of trading activity. The study provides evidence that the statistical properties of the financial markets are enclosed in the statistics of the time interval between trades. A multiplicative point process serves as a consistent model generating this statistics.

  2. Using Bayes' theorem for free energy calculations

    NASA Astrophysics Data System (ADS)

    Rogers, David M.

    Statistical mechanics is fundamentally based on calculating the probabilities of molecular-scale events. Although Bayes' theorem has generally been recognized as providing key guiding principals for setup and analysis of statistical experiments [83], classical frequentist models still predominate in the world of computational experimentation. As a starting point for widespread application of Bayesian methods in statistical mechanics, we investigate the central quantity of free energies from this perspective. This dissertation thus reviews the basics of Bayes' view of probability theory, and the maximum entropy formulation of statistical mechanics before providing examples of its application to several advanced research areas. We first apply Bayes' theorem to a multinomial counting problem in order to determine inner shell and hard sphere solvation free energy components of Quasi-Chemical Theory [140]. We proceed to consider the general problem of free energy calculations from samples of interaction energy distributions. From there, we turn to spline-based estimation of the potential of mean force [142], and empirical modeling of observed dynamics using integrator matching. The results of this research are expected to advance the state of the art in coarse-graining methods, as they allow a systematic connection from high-resolution (atomic) to low-resolution (coarse) structure and dynamics. In total, our work on these problems constitutes a critical starting point for further application of Bayes' theorem in all areas of statistical mechanics. It is hoped that the understanding so gained will allow for improvements in comparisons between theory and experiment.

  3. Digital morphogenesis via Schelling segregation

    NASA Astrophysics Data System (ADS)

    Barmpalias, George; Elwes, Richard; Lewis-Pye, Andrew

    2018-04-01

    Schelling’s model of segregation looks to explain the way in which particles or agents of two types may come to arrange themselves spatially into configurations consisting of large homogeneous clusters, i.e. connected regions consisting of only one type. As one of the earliest agent based models studied by economists and perhaps the most famous model of self-organising behaviour, it also has direct links to areas at the interface between computer science and statistical mechanics, such as the Ising model and the study of contagion and cascading phenomena in networks. While the model has been extensively studied it has largely resisted rigorous analysis, prior results from the literature generally pertaining to variants of the model which are tweaked so as to be amenable to standard techniques from statistical mechanics or stochastic evolutionary game theory. In Brandt et al (2012 Proc. 44th Annual ACM Symp. on Theory of Computing) provided the first rigorous analysis of the unperturbed model, for a specific set of input parameters. Here we provide a rigorous analysis of the model’s behaviour much more generally and establish some surprising forms of threshold behaviour, notably the existence of situations where an increased level of intolerance for neighbouring agents of opposite type leads almost certainly to decreased segregation.

  4. Consistent integration of experimental and ab initio data into molecular and coarse-grained models

    NASA Astrophysics Data System (ADS)

    Vlcek, Lukas

    As computer simulations are increasingly used to complement or replace experiments, highly accurate descriptions of physical systems at different time and length scales are required to achieve realistic predictions. The questions of how to objectively measure model quality in relation to reference experimental or ab initio data, and how to transition seamlessly between different levels of resolution are therefore of prime interest. To address these issues, we use the concept of statistical distance to define a measure of similarity between statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the systems' measurable properties. Through systematic coarse-graining, we arrive at appropriate expressions for optimization loss functions consistently incorporating microscopic ab initio data as well as macroscopic experimental data. The design of coarse-grained and multiscale models is then based on factoring the model system partition function into terms describing the system at different resolution levels. The optimization algorithm takes advantage of thermodynamic perturbation expressions for fast exploration of the model parameter space, enabling us to scan millions of parameter combinations per hour on a single CPU. The robustness and generality of the new model optimization framework and its efficient implementation are illustrated on selected examples including aqueous solutions, magnetic systems, and metal alloys.

  5. Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

    PubMed

    Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

    2011-10-01

    Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.

  6. Statistical model with two order parameters for ductile and soft fiber bundles in nanoscience and biomaterials.

    PubMed

    Rinaldi, Antonio

    2011-04-01

    Traditional fiber bundles models (FBMs) have been an effective tool to understand brittle heterogeneous systems. However, fiber bundles in modern nano- and bioapplications demand a new generation of FBM capturing more complex deformation processes in addition to damage. In the context of loose bundle systems and with reference to time-independent plasticity and soft biomaterials, we formulate a generalized statistical model for ductile fracture and nonlinear elastic problems capable of handling more simultaneous deformation mechanisms by means of two order parameters (as opposed to one). As the first rational FBM for coupled damage problems, it may be the cornerstone for advanced statistical models of heterogeneous systems in nanoscience and materials design, especially to explore hierarchical and bio-inspired concepts in the arena of nanobiotechnology. Applicative examples are provided for illustrative purposes at last, discussing issues in inverse analysis (i.e., nonlinear elastic polymer fiber and ductile Cu submicron bars arrays) and direct design (i.e., strength prediction).

  7. Seasonal Drought Prediction: Advances, Challenges, and Future Prospects

    NASA Astrophysics Data System (ADS)

    Hao, Zengchao; Singh, Vijay P.; Xia, Youlong

    2018-03-01

    Drought prediction is of critical importance to early warning for drought managements. This review provides a synthesis of drought prediction based on statistical, dynamical, and hybrid methods. Statistical drought prediction is achieved by modeling the relationship between drought indices of interest and a suite of potential predictors, including large-scale climate indices, local climate variables, and land initial conditions. Dynamical meteorological drought prediction relies on seasonal climate forecast from general circulation models (GCMs), which can be employed to drive hydrological models for agricultural and hydrological drought prediction with the predictability determined by both climate forcings and initial conditions. Challenges still exist in drought prediction at long lead time and under a changing environment resulting from natural and anthropogenic factors. Future research prospects to improve drought prediction include, but are not limited to, high-quality data assimilation, improved model development with key processes related to drought occurrence, optimal ensemble forecast to select or weight ensembles, and hybrid drought prediction to merge statistical and dynamical forecasts.

  8. External validation of the Probability of repeated admission (Pra) risk prediction tool in older community-dwelling people attending general practice: a prospective cohort study.

    PubMed

    Wallace, Emma; McDowell, Ronald; Bennett, Kathleen; Fahey, Tom; Smith, Susan M

    2016-11-14

    Emergency admission is associated with the potential for adverse events in older people and risk prediction models are available to identify those at highest risk of admission. The aim of this study was to externally validate and compare the performance of the Probability of repeated admission (Pra) risk model and a modified version (incorporating a multimorbidity measure) in predicting emergency admission in older community-dwelling people. 15 general practices (GPs) in the Republic of Ireland. n=862, ≥70 years, community-dwelling people prospectively followed up for 2 years (2010-2012). Pra risk model (original and modified) calculated for baseline year where ≥0.5 denoted high risk (patient questionnaire, GP medical record review) of future emergency admission. Emergency admission over 1 year (GP medical record review). descriptive statistics, model discrimination (c-statistic) and calibration (Hosmer-Lemeshow statistic). Of 862 patients, a total of 154 (18%) had ≥1 emergency admission(s) in the follow-up year. 63 patients (7%) were classified as high risk by the original Pra and of these 26 (41%) were admitted. The modified Pra classified 391 (45%) patients as high risk and 103 (26%) were subsequently admitted. Both models demonstrated only poor discrimination (original Pra: c-statistic 0.65 (95% CI 0.61 to 0.70); modified Pra: c-statistic 0.67 (95% CI 0.62 to 0.72)). When categorised according to risk-category model, specificity was highest for the original Pra at cut-point of ≥0.5 denoting high risk (95%), and for the modified Pra at cut-point of ≥0.7 (95%). Both models overestimated the number of admissions across all risk strata. While the original Pra model demonstrated poor discrimination, model specificity was high and a small number of patients identified as high risk. Future validation studies should examine higher cut-points denoting high risk for the modified Pra, which has practical advantages in terms of application in GP. The original Pra tool may have a role in identifying higher-risk community-dwelling older people for inclusion in future trials aiming to reduce emergency admissions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. Statistical properties of superimposed stationary spike trains.

    PubMed

    Deger, Moritz; Helias, Moritz; Boucsein, Clemens; Rotter, Stefan

    2012-06-01

    The Poisson process is an often employed model for the activity of neuronal populations. It is known, though, that superpositions of realistic, non- Poisson spike trains are not in general Poisson processes, not even for large numbers of superimposed processes. Here we construct superimposed spike trains from intracellular in vivo recordings from rat neocortex neurons and compare their statistics to specific point process models. The constructed superimposed spike trains reveal strong deviations from the Poisson model. We find that superpositions of model spike trains that take the effective refractoriness of the neurons into account yield a much better description. A minimal model of this kind is the Poisson process with dead-time (PPD). For this process, and for superpositions thereof, we obtain analytical expressions for some second-order statistical quantities-like the count variability, inter-spike interval (ISI) variability and ISI correlations-and demonstrate the match with the in vivo data. We conclude that effective refractoriness is the key property that shapes the statistical properties of the superposition spike trains. We present new, efficient algorithms to generate superpositions of PPDs and of gamma processes that can be used to provide more realistic background input in simulations of networks of spiking neurons. Using these generators, we show in simulations that neurons which receive superimposed spike trains as input are highly sensitive for the statistical effects induced by neuronal refractoriness.

  10. Comments of statistical issue in numerical modeling for underground nuclear test monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nicholson, W.L.; Anderson, K.K.

    1993-03-01

    The Symposium concluded with prepared summaries by four experts in the involved disciplines. These experts made no mention of statistics and/or the statistical content of issues. The first author contributed an extemporaneous statement at the Symposium because there are important issues associated with conducting and evaluating numerical modeling that are familiar to statisticians and often treated successfully by them. This note expands upon these extemporaneous remarks. Statistical ideas may be helpful in resolving some numerical modeling issues. Specifically, we comment first on the role of statistical design/analysis in the quantification process to answer the question ``what do we know aboutmore » the numerical modeling of underground nuclear tests?`` and second on the peculiar nature of uncertainty analysis for situations involving numerical modeling. The simulations described in the workshop, though associated with topic areas, were basically sets of examples. Each simulation was tuned towards agreeing with either empirical evidence or an expert`s opinion of what empirical evidence would be. While the discussions were reasonable, whether the embellishments were correct or a forced fitting of reality is unclear and illustrates that ``simulation is easy.`` We also suggest that these examples of simulation are typical and the questions concerning the legitimacy and the role of knowing the reality are fair, in general, with respect to simulation. The answers will help us understand why ``prediction is difficult.``« less

  11. Segmenting lung fields in serial chest radiographs using both population-based and patient-specific shape statistics.

    PubMed

    Shi, Y; Qi, F; Xue, Z; Chen, L; Ito, K; Matsuo, H; Shen, D

    2008-04-01

    This paper presents a new deformable model using both population-based and patient-specific shape statistics to segment lung fields from serial chest radiographs. There are two novelties in the proposed deformable model. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than the general intensity and gradient features, is used to characterize the image features in the vicinity of each pixel. Second, the deformable contour is constrained by both population-based and patient-specific shape statistics, and it yields more robust and accurate segmentation of lung fields for serial chest radiographs. In particular, for segmenting the initial time-point images, the population-based shape statistics is used to constrain the deformable contour; as more subsequent images of the same patient are acquired, the patient-specific shape statistics online collected from the previous segmentation results gradually takes more roles. Thus, this patient-specific shape statistics is updated each time when a new segmentation result is obtained, and it is further used to refine the segmentation results of all the available time-point images. Experimental results show that the proposed method is more robust and accurate than other active shape models in segmenting the lung fields from serial chest radiographs.

  12. Statistical fluctuations in pedestrian evacuation times and the effect of social contagion

    NASA Astrophysics Data System (ADS)

    Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.

    2016-08-01

    Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.

  13. Functional annotation of regulatory pathways.

    PubMed

    Pandey, Jayesh; Koyutürk, Mehmet; Kim, Yohan; Szpankowski, Wojciech; Subramaniam, Shankar; Grama, Ananth

    2007-07-01

    Standardized annotations of biomolecules in interaction networks (e.g. Gene Ontology) provide comprehensive understanding of the function of individual molecules. Extending such annotations to pathways is a critical component of functional characterization of cellular signaling at the systems level. We propose a framework for projecting gene regulatory networks onto the space of functional attributes using multigraph models, with the objective of deriving statistically significant pathway annotations. We first demonstrate that annotations of pairwise interactions do not generalize to indirect relationships between processes. Motivated by this result, we formalize the problem of identifying statistically overrepresented pathways of functional attributes. We establish the hardness of this problem by demonstrating the non-monotonicity of common statistical significance measures. We propose a statistical model that emphasizes the modularity of a pathway, evaluating its significance based on the coupling of its building blocks. We complement the statistical model by an efficient algorithm and software, Narada, for computing significant pathways in large regulatory networks. Comprehensive results from our methods applied to the Escherichia coli transcription network demonstrate that our approach is effective in identifying known, as well as novel biological pathway annotations. Narada is implemented in Java and is available at http://www.cs.purdue.edu/homes/jpandey/narada/.

  14. Maximum entropy models as a tool for building precise neural controls.

    PubMed

    Savin, Cristina; Tkačik, Gašper

    2017-10-01

    Neural responses are highly structured, with population activity restricted to a small subset of the astronomical range of possible activity patterns. Characterizing these statistical regularities is important for understanding circuit computation, but challenging in practice. Here we review recent approaches based on the maximum entropy principle used for quantifying collective behavior in neural activity. We highlight recent models that capture population-level statistics of neural data, yielding insights into the organization of the neural code and its biological substrate. Furthermore, the MaxEnt framework provides a general recipe for constructing surrogate ensembles that preserve aspects of the data, but are otherwise maximally unstructured. This idea can be used to generate a hierarchy of controls against which rigorous statistical tests are possible. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. VizieR Online Data Catalog: Supernova matter EOS (Buyukcizmeci+, 2014)

    NASA Astrophysics Data System (ADS)

    Buyukcizmeci, N.; Botvina, A. S.; Mishustin, I. N.

    2017-03-01

    The Statistical Model for Supernova Matter (SMSM) was developed in Botvina & Mishustin (2004, PhLB, 584, 233 ; 2010, NuPhA, 843, 98) as a direct generalization of the Statistical Multifragmentation Model (SMM; Bondorf et al. 1995, PhR, 257, 133). We treat supernova matter as a mixture of nuclear species, electrons, and photons in statistical equilibrium. The SMSM EOS tables cover the following ranges of control parameters: 1. Temperature: T = 0.2-25 MeV; for 35 T values. 2. Electron fraction Ye: 0.02-0.56; linear mesh of Ye = 0.02, giving 28 Ye values. It is equal to the total proton fraction Xp, due to charge neutrality. 3. Baryon number density fraction {rho}/{rho}0 = (10-8-0.32), giving 31 {rho}/{rho}0 values. (2 data files).

  16. Variable selection for marginal longitudinal generalized linear models.

    PubMed

    Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

    2005-06-01

    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).

  17. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    PubMed

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  18. A comparison of hydrologic models for ecological flows and water availability

    USGS Publications Warehouse

    Caldwell, Peter V; Kennen, Jonathan G.; Sun, Ge; Kiang, Julie E.; Butcher, John B; Eddy, Michelle C; Hay, Lauren E.; LaFontaine, Jacob H.; Hain, Ernie F.; Nelson, Stacy C; McNulty, Steve G

    2015-01-01

    Robust hydrologic models are needed to help manage water resources for healthy aquatic ecosystems and reliable water supplies for people, but there is a lack of comprehensive model comparison studies that quantify differences in streamflow predictions among model applications developed to answer management questions. We assessed differences in daily streamflow predictions by four fine-scale models and two regional-scale monthly time step models by comparing model fit statistics and bias in ecologically relevant flow statistics (ERFSs) at five sites in the Southeastern USA. Models were calibrated to different extents, including uncalibrated (level A), calibrated to a downstream site (level B), calibrated specifically for the site (level C) and calibrated for the site with adjusted precipitation and temperature inputs (level D). All models generally captured the magnitude and variability of observed streamflows at the five study sites, and increasing level of model calibration generally improved performance. All models had at least 1 of 14 ERFSs falling outside a +/−30% range of hydrologic uncertainty at every site, and ERFSs related to low flows were frequently over-predicted. Our results do not indicate that any specific hydrologic model is superior to the others evaluated at all sites and for all measures of model performance. Instead, we provide evidence that (1) model performance is as likely to be related to calibration strategy as it is to model structure and (2) simple, regional-scale models have comparable performance to the more complex, fine-scale models at a monthly time step.

  19. Extracting Spurious Latent Classes in Growth Mixture Modeling with Nonnormal Errors

    ERIC Educational Resources Information Center

    Guerra-Peña, Kiero; Steinley, Douglas

    2016-01-01

    Growth mixture modeling is generally used for two purposes: (1) to identify mixtures of normal subgroups and (2) to approximate oddly shaped distributions by a mixture of normal components. Often in applied research this methodology is applied to both of these situations indistinctly: using the same fit statistics and likelihood ratio tests. This…

  20. Statistical Forecasting of Bankruptcy of Defense Contractors. Problems and Prospects

    DTIC Science & Technology

    1994-01-01

    investors is along the lines of the Capital Asset Pricing Model ( CAPM ). In portfolio theory generally, investors demand an expected-return premium for...Ellen Pint, Rachel Schmidt, and especially Dennis Smallwood of RAND also contributed useful insights and comments. xv Acronyms CAPM Capital Asset ...Bond Yields ............................................. 26 Bond Model Performance ................................. 27 Extensions and Limitations

  1. Statistical Time Series Models of Pilot Control with Applications to Instrument Discrimination

    NASA Technical Reports Server (NTRS)

    Altschul, R. E.; Nagel, P. M.; Oliver, F.

    1984-01-01

    A general description of the methodology used in obtaining the transfer function models and verification of model fidelity, frequency domain plots of the modeled transfer functions, numerical results obtained from an analysis of poles and zeroes obtained from z plane to s-plane conversions of the transfer functions, and the results of a study on the sequential introduction of other variables, both exogenous and endogenous into the loop are contained.

  2. Statistical-Dynamical Seasonal Forecasts of Central-Southwest Asian Winter Precipitation.

    NASA Astrophysics Data System (ADS)

    Tippett, Michael K.; Goddard, Lisa; Barnston, Anthony G.

    2005-06-01

    Interannual precipitation variability in central-southwest (CSW) Asia has been associated with East Asian jet stream variability and western Pacific tropical convection. However, atmospheric general circulation models (AGCMs) forced by observed sea surface temperature (SST) poorly simulate the region's interannual precipitation variability. The statistical-dynamical approach uses statistical methods to correct systematic deficiencies in the response of AGCMs to SST forcing. Statistical correction methods linking model-simulated Indo-west Pacific precipitation and observed CSW Asia precipitation result in modest, but statistically significant, cross-validated simulation skill in the northeast part of the domain for the period from 1951 to 1998. The statistical-dynamical method is also applied to recent (winter 1998/99 to 2002/03) multimodel, two-tier December-March precipitation forecasts initiated in October. This period includes 4 yr (winter of 1998/99 to 2001/02) of severe drought. Tercile probability forecasts are produced using ensemble-mean forecasts and forecast error estimates. The statistical-dynamical forecasts show enhanced probability of below-normal precipitation for the four drought years and capture the return to normal conditions in part of the region during the winter of 2002/03.May Kabul be without gold, but not without snow.—Traditional Afghan proverb

  3. Estimation and model selection of semiparametric multivariate survival functions under general censorship.

    PubMed

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2010-07-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.

  4. Estimation and model selection of semiparametric multivariate survival functions under general censorship

    PubMed Central

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2013-01-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286

  5. Adopting adequate leaching requirement for practical response models of basil to salinity

    NASA Astrophysics Data System (ADS)

    Babazadeh, Hossein; Tabrizi, Mahdi Sarai; Darvishi, Hossein Hassanpour

    2016-07-01

    Several mathematical models are being used for assessing plant response to salinity of the root zone. Objectives of this study included quantifying the yield salinity threshold value of basil plants to irrigation water salinity and investigating the possibilities of using irrigation water salinity instead of saturated extract salinity in the available mathematical models for estimating yield. To achieve the above objectives, an extensive greenhouse experiment was conducted with 13 irrigation water salinity levels, namely 1.175 dS m-1 (control treatment) and 1.8 to 10 dS m-1. The result indicated that, among these models, the modified discount model (one of the most famous root water uptake model which is based on statistics) produced more accurate results in simulating the basil yield reduction function using irrigation water salinities. Overall the statistical model of Steppuhn et al. on the modified discount model and the math-empirical model of van Genuchten and Hoffman provided the best results. In general, all of the statistical models produced very similar results and their results were better than math-empirical models. It was also concluded that if enough leaching was present, there was no significant difference between the soil salinity saturated extract models and the models using irrigation water salinity.

  6. Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system

    PubMed Central

    Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J.; Olson, Don; Weiss, Don

    2017-01-01

    The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method’s implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System’s C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis. PMID:28886112

  7. Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system.

    PubMed

    Mathes, Robert W; Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J; Olson, Don; Weiss, Don

    2017-01-01

    The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method's implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System's C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis.

  8. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  9. The impact of statistical adjustment on conditional standard errors of measurement in the assessment of physician communication skills.

    PubMed

    Raymond, Mark R; Clauser, Brian E; Furman, Gail E

    2010-10-01

    The use of standardized patients to assess communication skills is now an essential part of assessing a physician's readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.

  10. Resolving Structural Variability in Network Models and the Brain

    PubMed Central

    Klimm, Florian; Bassett, Danielle S.; Carlson, Jean M.; Mucha, Peter J.

    2014-01-01

    Large-scale white matter pathways crisscrossing the cortex create a complex pattern of connectivity that underlies human cognitive function. Generative mechanisms for this architecture have been difficult to identify in part because little is known in general about mechanistic drivers of structured networks. Here we contrast network properties derived from diffusion spectrum imaging data of the human brain with 13 synthetic network models chosen to probe the roles of physical network embedding and temporal network growth. We characterize both the empirical and synthetic networks using familiar graph metrics, but presented here in a more complete statistical form, as scatter plots and distributions, to reveal the full range of variability of each measure across scales in the network. We focus specifically on the degree distribution, degree assortativity, hierarchy, topological Rentian scaling, and topological fractal scaling—in addition to several summary statistics, including the mean clustering coefficient, the shortest path-length, and the network diameter. The models are investigated in a progressive, branching sequence, aimed at capturing different elements thought to be important in the brain, and range from simple random and regular networks, to models that incorporate specific growth rules and constraints. We find that synthetic models that constrain the network nodes to be physically embedded in anatomical brain regions tend to produce distributions that are most similar to the corresponding measurements for the brain. We also find that network models hardcoded to display one network property (e.g., assortativity) do not in general simultaneously display a second (e.g., hierarchy). This relative independence of network properties suggests that multiple neurobiological mechanisms might be at play in the development of human brain network architecture. Together, the network models that we develop and employ provide a potentially useful starting point for the statistical inference of brain network structure from neuroimaging data. PMID:24675546

  11. Exploring the squeezed three-point galaxy correlation function with generalized halo occupation distribution models

    NASA Astrophysics Data System (ADS)

    Yuan, Sihan; Eisenstein, Daniel J.; Garrison, Lehman H.

    2018-04-01

    We present the GeneRalized ANd Differentiable Halo Occupation Distribution (GRAND-HOD) routine that generalizes the standard 5 parameter halo occupation distribution model (HOD) with various halo-scale physics and assembly bias. We describe the methodology of 4 different generalizations: satellite distribution generalization, velocity bias, closest approach distance generalization, and assembly bias. We showcase the signatures of these generalizations in the 2-point correlation function (2PCF) and the squeezed 3-point correlation function (squeezed 3PCF). We identify generalized HOD prescriptions that are nearly degenerate in the projected 2PCF and demonstrate that these degeneracies are broken in the redshift-space anisotropic 2PCF and the squeezed 3PCF. We also discuss the possibility of identifying degeneracies in the anisotropic 2PCF and further demonstrate the extra constraining power of the squeezed 3PCF on galaxy-halo connection models. We find that within our current HOD framework, the anisotropic 2PCF can predict the squeezed 3PCF better than its statistical error. This implies that a discordant squeezed 3PCF measurement could falsify the particular HOD model space. Alternatively, it is possible that further generalizations of the HOD model would open opportunities for the squeezed 3PCF to provide novel parameter measurements. The GRAND-HOD Python package is publicly available at https://github.com/SandyYuan/GRAND-HOD.

  12. A female black bear denning habitat model using a geographic information system

    USGS Publications Warehouse

    Clark, J.D.; Hayes, S.G.; Pledger, J.M.

    1998-01-01

    We used the Mahalanobis distance statistic and a raster geographic information system (GIS) to model potential black bear (Ursus americanus) denning habitat in the Ouachita Mountains of Arkansas. The Mahalanobis distance statistic was used to represent the standard squared distance between sample variates in the GIS database (forest cover type, elevation, slope, aspect, distance to streams, distance to roads, and forest cover richness) and variates at known bear dens. Two models were developed: a generalized model for all den locations and another specific to dens in rock cavities. Differences between habitat at den sites and habitat across the study area were represented in 2 new GIS themes as Mahalanobis distance values. Cells similar to the mean vector derived from the known dens had low Mahalanobis distance values, and dissimilar cells had high values. The reliability of the predictive model was tested by overlaying den locations collected subsequent to original model development on the resultant den habitat themes. Although the generalized model demonstrated poor reliability, the model specific to rock dens had good reliability. Bears were more likely to choose rock den locations with low Mahalanobis distance values and less likely to choose those with high values. The model can be used to plan the timing and extent of management actions (e.g., road building, prescribed fire, timber harvest) most appropriate for those sites with high or low denning potential. 

  13. Demographic factors and traffic crashes. Part 1, descriptive statistics and models

    DOT National Transportation Integrated Search

    1998-08-01

    This research analyzes the Department of Highway Safety and Motor Vehicle's (DHSMV) 1993 to 1995 crash data. There are four demographic variables investigated throughout the research, which are age, gender, race, and residency. To show general trends...

  14. Effects of seatbelt laws on highway fatalities

    DOT National Transportation Integrated Search

    1989-11-01

    The statistical models used in this update indicate that states : which have a seatbelt law have experienced on average a 7.7 : percent reduction in frontseat occupant fatalities in vehicles : generally covered by laws. That is, on average in any law...

  15. Markov modulated Poisson process models incorporating covariates for rainfall intensity.

    PubMed

    Thayakaran, R; Ramesh, N I

    2013-01-01

    Time series of rainfall bucket tip times at the Beaufort Park station, Bracknell, in the UK are modelled by a class of Markov modulated Poisson processes (MMPP) which may be thought of as a generalization of the Poisson process. Our main focus in this paper is to investigate the effects of including covariate information into the MMPP model framework on statistical properties. In particular, we look at three types of time-varying covariates namely temperature, sea level pressure, and relative humidity that are thought to be affecting the rainfall arrival process. Maximum likelihood estimation is used to obtain the parameter estimates, and likelihood ratio tests are employed in model comparison. Simulated data from the fitted model are used to make statistical inferences about the accumulated rainfall in the discrete time interval. Variability of the daily Poisson arrival rates is studied.

  16. Statistical and simulation analysis of hydraulic-conductivity data for Bear Creek and Melton Valleys, Oak Ridge Reservation, Tennessee

    USGS Publications Warehouse

    Connell, J.F.; Bailey, Z.C.

    1989-01-01

    A total of 338 single-well aquifer tests from Bear Creek and Melton Valley, Tennessee were statistically grouped to estimate hydraulic conductivities for the geologic formations in the valleys. A cross-sectional simulation model linked to a regression model was used to further refine the statistical estimates for each of the formations and to improve understanding of ground-water flow in Bear Creek Valley. Median hydraulic-conductivity values were used as initial values in the model. Model-calculated estimates of hydraulic conductivity were generally lower than the statistical estimates. Simulations indicate that (1) the Pumpkin Valley Shale controls groundwater flow between Pine Ridge and Bear Creek; (2) all the recharge on Chestnut Ridge discharges to the Maynardville Limestone; (3) the formations having smaller hydraulic gradients may have a greater tendency for flow along strike; (4) local hydraulic conditions in the Maynardville Limestone cause inaccurate model-calculated estimates of hydraulic conductivity; and (5) the conductivity of deep bedrock neither affects the results of the model nor does it add information on the flow system. Improved model performance would require: (1) more water level data for the Copper Ridge Dolomite; (2) improved estimates of hydraulic conductivity in the Copper Ridge Dolomite and Maynardville Limestone; and (3) more water level data and aquifer tests in deep bedrock. (USGS)

  17. A statistical parts-based appearance model of inter-subject variability.

    PubMed

    Toews, Matthew; Collins, D Louis; Arbel, Tal

    2006-01-01

    In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.

  18. Building integral projection models: a user's guide

    PubMed Central

    Rees, Mark; Childs, Dylan Z; Ellner, Stephen P; Coulson, Tim

    2014-01-01

    In order to understand how changes in individual performance (growth, survival or reproduction) influence population dynamics and evolution, ecologists are increasingly using parameterized mathematical models. For continuously structured populations, where some continuous measure of individual state influences growth, survival or reproduction, integral projection models (IPMs) are commonly used. We provide a detailed description of the steps involved in constructing an IPM, explaining how to: (i) translate your study system into an IPM; (ii) implement your IPM; and (iii) diagnose potential problems with your IPM. We emphasize how the study organism's life cycle, and the timing of censuses, together determine the structure of the IPM kernel and important aspects of the statistical analysis used to parameterize an IPM using data on marked individuals. An IPM based on population studies of Soay sheep is used to illustrate the complete process of constructing, implementing and evaluating an IPM fitted to sample data. We then look at very general approaches to parameterizing an IPM, using a wide range of statistical techniques (e.g. maximum likelihood methods, generalized additive models, nonparametric kernel density estimators). Methods for selecting models for parameterizing IPMs are briefly discussed. We conclude with key recommendations and a brief overview of applications that extend the basic model. The online Supporting Information provides commented R code for all our analyses. PMID:24219157

  19. A weighted generalized score statistic for comparison of predictive values of diagnostic tests

    PubMed Central

    Kosinski, Andrzej S.

    2013-01-01

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343

  20. Causal Models and Exploratory Analysis in Heterogeneous Information Fusion for Detecting Potential Terrorists

    DTIC Science & Technology

    2015-11-01

    issues and made some of the same distinctions (Walker, Lempert, and Kwakkel ; Bankes, Lempert, and Popper, 2005), but it did appear that we had more than...Statistics,” British Journal of Mathematical Statistics and Pscyhology, 66, pp. 8-38. Jaynes, Edwin T., and G . Larry Bretthorst (ed.) (2003) , Probability...Giroux. Lempert, Robert J., David G . Groves, Steven W. Popper, and Steven C. Bankes (2006), “A General Analytic Method for Generating Robust

  1. Cluster detection methods applied to the Upper Cape Cod cancer data.

    PubMed

    Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

    2005-09-15

    A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  2. Interference in the classical probabilistic model and its representation in complex Hilbert space

    NASA Astrophysics Data System (ADS)

    Khrennikov, Andrei Yu.

    2005-10-01

    The notion of a context (complex of physical conditions, that is to say: specification of the measurement setup) is basic in this paper.We show that the main structures of quantum theory (interference of probabilities, Born's rule, complex probabilistic amplitudes, Hilbert state space, representation of observables by operators) are present already in a latent form in the classical Kolmogorov probability model. However, this model should be considered as a calculus of contextual probabilities. In our approach it is forbidden to consider abstract context independent probabilities: “first context and only then probability”. We construct the representation of the general contextual probabilistic dynamics in the complex Hilbert space. Thus dynamics of the wave function (in particular, Schrödinger's dynamics) can be considered as Hilbert space projections of a realistic dynamics in a “prespace”. The basic condition for representing of the prespace-dynamics is the law of statistical conservation of energy-conservation of probabilities. In general the Hilbert space projection of the “prespace” dynamics can be nonlinear and even irreversible (but it is always unitary). Methods developed in this paper can be applied not only to quantum mechanics, but also to classical statistical mechanics. The main quantum-like structures (e.g., interference of probabilities) might be found in some models of classical statistical mechanics. Quantum-like probabilistic behavior can be demonstrated by biological systems. In particular, it was recently found in some psychological experiments.

  3. Using structural equation modeling for network meta-analysis.

    PubMed

    Tu, Yu-Kang; Wu, Yun-Chun

    2017-07-14

    Network meta-analysis overcomes the limitations of traditional pair-wise meta-analysis by incorporating all available evidence into a general statistical framework for simultaneous comparisons of several treatments. Currently, network meta-analyses are undertaken either within the Bayesian hierarchical linear models or frequentist generalized linear mixed models. Structural equation modeling (SEM) is a statistical method originally developed for modeling causal relations among observed and latent variables. As random effect is explicitly modeled as a latent variable in SEM, it is very flexible for analysts to specify complex random effect structure and to make linear and nonlinear constraints on parameters. The aim of this article is to show how to undertake a network meta-analysis within the statistical framework of SEM. We used an example dataset to demonstrate the standard fixed and random effect network meta-analysis models can be easily implemented in SEM. It contains results of 26 studies that directly compared three treatment groups A, B and C for prevention of first bleeding in patients with liver cirrhosis. We also showed that a new approach to network meta-analysis based on the technique of unrestricted weighted least squares (UWLS) method can also be undertaken using SEM. For both the fixed and random effect network meta-analysis, SEM yielded similar coefficients and confidence intervals to those reported in the previous literature. The point estimates of two UWLS models were identical to those in the fixed effect model but the confidence intervals were greater. This is consistent with results from the traditional pairwise meta-analyses. Comparing to UWLS model with common variance adjusted factor, UWLS model with unique variance adjusted factor has greater confidence intervals when the heterogeneity was larger in the pairwise comparison. The UWLS model with unique variance adjusted factor reflects the difference in heterogeneity within each comparison. SEM provides a very flexible framework for univariate and multivariate meta-analysis, and its potential as a powerful tool for advanced meta-analysis is still to be explored.

  4. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  5. Human-modified temperatures induce species changes: Joint attribution.

    PubMed

    Root, Terry L; MacMynowski, Dena P; Mastrandrea, Michael D; Schneider, Stephen H

    2005-05-24

    Average global surface-air temperature is increasing. Contention exists over relative contributions by natural and anthropogenic forcings. Ecological studies attribute plant and animal changes to observed warming. Until now, temperature-species connections have not been statistically attributed directly to anthropogenic climatic change. Using modeled climatic variables and observed species data, which are independent of thermometer records and paleoclimatic proxies, we demonstrate statistically significant "joint attribution," a two-step linkage: human activities contribute significantly to temperature changes and human-changed temperatures are associated with discernible changes in plant and animal traits. Additionally, our analyses provide independent testing of grid-box-scale temperature projections from a general circulation model (HadCM3).

  6. MWASTools: an R/bioconductor package for metabolome-wide association studies.

    PubMed

    Rodriguez-Martinez, Andrea; Posma, Joram M; Ayala, Rafael; Neves, Ana L; Anwar, Maryam; Petretto, Enrico; Emanueli, Costanza; Gauguier, Dominique; Nicholson, Jeremy K; Dumas, Marc-Emmanuel

    2018-03-01

    MWASTools is an R package designed to provide an integrated pipeline to analyse metabonomic data in large-scale epidemiological studies. Key functionalities of our package include: quality control analysis; metabolome-wide association analysis using various models (partial correlations, generalized linear models); visualization of statistical outcomes; metabolite assignment using statistical total correlation spectroscopy (STOCSY); and biological interpretation of metabolome-wide association studies results. The MWASTools R package is implemented in R (version  > =3.4) and is available from Bioconductor: https://bioconductor.org/packages/MWASTools/. m.dumas@imperial.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  7. A new in silico classification model for ready biodegradability, based on molecular fragments.

    PubMed

    Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina

    2014-08-01

    Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Mean energy of some interacting bosonic systems derived by virtue of the generalized Hellmann-Feynman theorem

    NASA Astrophysics Data System (ADS)

    Fan, Hong-yi; Xu, Xue-xiang

    2009-06-01

    By virtue of the generalized Hellmann-Feynman theorem [H. Y. Fan and B. Z. Chen, Phys. Lett. A 203, 95 (1995)], we derive the mean energy of some interacting bosonic systems for some Hamiltonian models without proceeding with diagonalizing the Hamiltonians. Our work extends the field of applications of the Hellmann-Feynman theorem and may enrich the theory of quantum statistics.

  9. Probabilistic regional climate projection in Japan using a regression model with CMIP5 multi-model ensemble experiments

    NASA Astrophysics Data System (ADS)

    Ishizaki, N. N.; Dairaku, K.; Ueno, G.

    2016-12-01

    We have developed a statistical downscaling method for estimating probabilistic climate projection using CMIP5 multi general circulation models (GCMs). A regression model was established so that the combination of weights of GCMs reflects the characteristics of the variation of observations at each grid point. Cross validations were conducted to select GCMs and to evaluate the regression model to avoid multicollinearity. By using spatially high resolution observation system, we conducted statistically downscaled probabilistic climate projections with 20-km horizontal grid spacing. Root mean squared errors for monthly mean air surface temperature and precipitation estimated by the regression method were the smallest compared with the results derived from a simple ensemble mean of GCMs and a cumulative distribution function based bias correction method. Projected changes in the mean temperature and precipitation were basically similar to those of the simple ensemble mean of GCMs. Mean precipitation was generally projected to increase associated with increased temperature and consequent increased moisture content in the air. Weakening of the winter monsoon may affect precipitation decrease in some areas. Temperature increase in excess of 4 K was expected in most areas of Japan in the end of 21st century under RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm would increase around the Pacific side during the summer and the Japan Sea side during the winter season. This probabilistic climate projection based on the statistical method can be expected to bring useful information to the impact studies and risk assessments.

  10. Nonlinear GARCH model and 1 / f noise

    NASA Astrophysics Data System (ADS)

    Kononovicius, A.; Ruseckas, J.

    2015-06-01

    Auto-regressive conditionally heteroskedastic (ARCH) family models are still used, by practitioners in business and economic policy making, as a conditional volatility forecasting models. Furthermore ARCH models still are attracting an interest of the researchers. In this contribution we consider the well known GARCH(1,1) process and its nonlinear modifications, reminiscent of NGARCH model. We investigate the possibility to reproduce power law statistics, probability density function and power spectral density, using ARCH family models. For this purpose we derive stochastic differential equations from the GARCH processes in consideration. We find the obtained equations to be similar to a general class of stochastic differential equations known to reproduce power law statistics. We show that linear GARCH(1,1) process has power law distribution, but its power spectral density is Brownian noise-like. However, the nonlinear modifications exhibit both power law distribution and power spectral density of the 1 /fβ form, including 1 / f noise.

  11. Thermodynamic limit of random partitions and dispersionless Toda hierarchy

    NASA Astrophysics Data System (ADS)

    Takasaki, Kanehisa; Nakatsu, Toshio

    2012-01-01

    We study the thermodynamic limit of random partition models for the instanton sum of 4D and 5D supersymmetric U(1) gauge theories deformed by some physical observables. The physical observables correspond to external potentials in the statistical model. The partition function is reformulated in terms of the density function of Maya diagrams. The thermodynamic limit is governed by a limit shape of Young diagrams associated with dominant terms in the partition function. The limit shape is characterized by a variational problem, which is further converted to a scalar-valued Riemann-Hilbert problem. This Riemann-Hilbert problem is solved with the aid of a complex curve, which may be thought of as the Seiberg-Witten curve of the deformed U(1) gauge theory. This solution of the Riemann-Hilbert problem is identified with a special solution of the dispersionless Toda hierarchy that satisfies a pair of generalized string equations. The generalized string equations for the 5D gauge theory are shown to be related to hidden symmetries of the statistical model. The prepotential and the Seiberg-Witten differential are also considered.

  12. Replication of a gene-environment interaction Via Multimodel inference: additive-genetic variance in adolescents' general cognitive ability increases with family-of-origin socioeconomic status.

    PubMed

    Kirkpatrick, Robert M; McGue, Matt; Iacono, William G

    2015-03-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES-an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research.

  13. Replication of a Gene-Environment Interaction via Multimodel Inference: Additive-Genetic Variance in Adolescents’ General Cognitive Ability Increases with Family-of-Origin Socioeconomic Status

    PubMed Central

    Kirkpatrick, Robert M.; McGue, Matt; Iacono, William G.

    2015-01-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES—an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research. PMID:25539975

  14. Steepest entropy ascent model for far-nonequilibrium thermodynamics: Unified implementation of the maximum entropy production principle

    NASA Astrophysics Data System (ADS)

    Beretta, Gian Paolo

    2014-10-01

    By suitable reformulations, we cast the mathematical frameworks of several well-known different approaches to the description of nonequilibrium dynamics into a unified formulation valid in all these contexts, which extends to such frameworks the concept of steepest entropy ascent (SEA) dynamics introduced by the present author in previous works on quantum thermodynamics. Actually, the present formulation constitutes a generalization also for the quantum thermodynamics framework. The analysis emphasizes that in the SEA modeling principle a key role is played by the geometrical metric with respect to which to measure the length of a trajectory in state space. In the near-thermodynamic-equilibrium limit, the metric tensor is directly related to the Onsager's generalized resistivity tensor. Therefore, through the identification of a suitable metric field which generalizes the Onsager generalized resistance to the arbitrarily far-nonequilibrium domain, most of the existing theories of nonequilibrium thermodynamics can be cast in such a way that the state exhibits the spontaneous tendency to evolve in state space along the path of SEA compatible with the conservation constraints and the boundary conditions. The resulting unified family of SEA dynamical models is intrinsically and strongly consistent with the second law of thermodynamics. The non-negativity of the entropy production is a general and readily proved feature of SEA dynamics. In several of the different approaches to nonequilibrium description we consider here, the SEA concept has not been investigated before. We believe it defines the precise meaning and the domain of general validity of the so-called maximum entropy production principle. Therefore, it is hoped that the present unifying approach may prove useful in providing a fresh basis for effective, thermodynamically consistent, numerical models and theoretical treatments of irreversible conservative relaxation towards equilibrium from far nonequilibrium states. The mathematical frameworks we consider are the following: (A) statistical or information-theoretic models of relaxation; (B) small-scale and rarefied gas dynamics (i.e., kinetic models for the Boltzmann equation); (C) rational extended thermodynamics, macroscopic nonequilibrium thermodynamics, and chemical kinetics; (D) mesoscopic nonequilibrium thermodynamics, continuum mechanics with fluctuations; and (E) quantum statistical mechanics, quantum thermodynamics, mesoscopic nonequilibrium quantum thermodynamics, and intrinsic quantum thermodynamics.

  15. Response statistics of rotating shaft with non-linear elastic restoring forces by path integration

    NASA Astrophysics Data System (ADS)

    Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael

    2017-07-01

    Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.

  16. Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

    PubMed Central

    Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua

    2010-01-01

    Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640

  17. Maximum profile likelihood estimation of differential equation parameters through model based smoothing state estimates.

    PubMed

    Campbell, D A; Chkrebtii, O

    2013-12-01

    Statistical inference for biochemical models often faces a variety of characteristic challenges. In this paper we examine state and parameter estimation for the JAK-STAT intracellular signalling mechanism, which exemplifies the implementation intricacies common in many biochemical inference problems. We introduce an extension to the Generalized Smoothing approach for estimating delay differential equation models, addressing selection of complexity parameters, choice of the basis system, and appropriate optimization strategies. Motivated by the JAK-STAT system, we further extend the generalized smoothing approach to consider a nonlinear observation process with additional unknown parameters, and highlight how the approach handles unobserved states and unevenly spaced observations. The methodology developed is generally applicable to problems of estimation for differential equation models with delays, unobserved states, nonlinear observation processes, and partially observed histories. Crown Copyright © 2013. Published by Elsevier Inc. All rights reserved.

  18. Statistical Downscaling of WRF-Chem Model: An Air Quality Analysis over Bogota, Colombia

    NASA Astrophysics Data System (ADS)

    Kumar, Anikender; Rojas, Nestor

    2015-04-01

    Statistical downscaling is a technique that is used to extract high-resolution information from regional scale variables produced by coarse resolution models such as Chemical Transport Models (CTMs). The fully coupled WRF-Chem (Weather Research and Forecasting with Chemistry) model is used to simulate air quality over Bogota. Bogota is a tropical Andean megacity located over a high-altitude plateau in the middle of very complex terrain. The WRF-Chem model was adopted for simulating the hourly ozone concentrations. The computational domains were chosen of 120x120x32, 121x121x32 and 121x121x32 grid points with horizontal resolutions of 27, 9 and 3 km respectively. The model was initialized with real boundary conditions using NCAR-NCEP's Final Analysis (FNL) and a 1ox1o (~111 km x 111 km) resolution. Boundary conditions were updated every 6 hours using reanalysis data. The emission rates were obtained from global inventories, namely the REanalysis of the TROpospheric (RETRO) chemical composition and the Emission Database for Global Atmospheric Research (EDGAR). Multiple linear regression and artificial neural network techniques are used to downscale the model output at each monitoring stations. The results confirm that the statistically downscaled outputs reduce simulated errors by up to 25%. This study provides a general overview of statistical downscaling of chemical transport models and can constitute a reference for future air quality modeling exercises over Bogota and other Colombian cities.

  19. A re-evaluation of a case-control model with contaminated controls for resource selection studies

    Treesearch

    Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski

    2013-01-01

    A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...

  20. Fully Bayesian Estimation of Data from Single Case Designs

    ERIC Educational Resources Information Center

    Rindskopf, David

    2013-01-01

    Single case designs (SCDs) generally consist of a small number of short time series in two or more phases. The analysis of SCDs statistically fits in the framework of a multilevel model, or hierarchical model. The usual analysis does not take into account the uncertainty in the estimation of the random effects. This not only has an effect on the…

  1. Correspondence between spanning trees and the Ising model on a square lattice

    NASA Astrophysics Data System (ADS)

    Viswanathan, G. M.

    2017-06-01

    An important problem in statistical physics concerns the fascinating connections between partition functions of lattice models studied in equilibrium statistical mechanics on the one hand and graph theoretical enumeration problems on the other hand. We investigate the nature of the relationship between the number of spanning trees and the partition function of the Ising model on the square lattice. The spanning tree generating function T (z ) gives the spanning tree constant when evaluated at z =1 , while giving the lattice green function when differentiated. It is known that for the infinite square lattice the partition function Z (K ) of the Ising model evaluated at the critical temperature K =Kc is related to T (1 ) . Here we show that this idea in fact generalizes to all real temperatures. We prove that [Z(K ) s e c h 2 K ] 2=k exp[T (k )] , where k =2 tanh(2 K )s e c h (2 K ) . The identical Mahler measure connects the two seemingly disparate quantities T (z ) and Z (K ) . In turn, the Mahler measure is determined by the random walk structure function. Finally, we show that the the above correspondence does not generalize in a straightforward manner to nonplanar lattices.

  2. Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

    PubMed Central

    Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

    2015-01-01

    Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855

  3. A statistical mechanical model of economics

    NASA Astrophysics Data System (ADS)

    Lubbers, Nicholas Edward Williams

    Statistical mechanics pursues low-dimensional descriptions of systems with a very large number of degrees of freedom. I explore this theme in two contexts. The main body of this dissertation explores and extends the Yard Sale Model (YSM) of economic transactions using a combination of simulations and theory. The YSM is a simple interacting model for wealth distributions which has the potential to explain the empirical observation of Pareto distributions of wealth. I develop the link between wealth condensation and the breakdown of ergodicity due to nonlinear diffusion effects which are analogous to the geometric random walk. Using this, I develop a deterministic effective theory of wealth transfer in the YSM that is useful for explaining many quantitative results. I introduce various forms of growth to the model, paying attention to the effect of growth on wealth condensation, inequality, and ergodicity. Arithmetic growth is found to partially break condensation, and geometric growth is found to completely break condensation. Further generalizations of geometric growth with growth in- equality show that the system is divided into two phases by a tipping point in the inequality parameter. The tipping point marks the line between systems which are ergodic and systems which exhibit wealth condensation. I explore generalizations of the YSM transaction scheme to arbitrary betting functions to develop notions of universality in YSM-like models. I find that wealth vi condensation is universal to a large class of models which can be divided into two phases. The first exhibits slow, power-law condensation dynamics, and the second exhibits fast, finite-time condensation dynamics. I find that the YSM, which exhibits exponential dynamics, is the critical, self-similar model which marks the dividing line between the two phases. The final chapter develops a low-dimensional approach to materials microstructure quantification. Modern materials design harnesses complex microstructure effects to develop high-performance materials, but general microstructure quantification is an unsolved problem. Motivated by statistical physics, I envision microstructure as a low-dimensional manifold, and construct this manifold by leveraging multiple machine learning approaches including transfer learning, dimensionality reduction, and computer vision breakthroughs with convolutional neural networks.

  4. Generalized self-adjustment method for statistical mechanics of composite materials

    NASA Astrophysics Data System (ADS)

    Pan'kov, A. A.

    1997-03-01

    A new method is developed for the statistical mechanics of composite materials — the generalized selfadjustment method — which makes it possible to reduce the problem of predicting effective elastic properties of composites with random structures to the solution of two simpler "averaged" problems of an inclusion with transitional layers in a medium with the desired effective elastic properties. The inhomogeneous elastic properties and dimensions of the transitional layers take into account both the "approximate" order of mutual positioning, and also the variation in the dimensions and elastics properties of inclusions through appropriate special averaged indicator functions of the random structure of the composite. A numerical calculation of averaged indicator functions and effective elastic characteristics is performed by the generalized self-adjustment method for a unidirectional fiberglass on the basis of various models of actual random structures in the plane of isotropy.

  5. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  6. The General Assessment of Personality Disorder (GAPD): factor structure, incremental validity of self-pathology, and relations to DSM-IV personality disorders.

    PubMed

    Hentschel, Annett G; Livesley, W John

    2013-01-01

    Recent developments in the classification of personality disorder, especially moves toward more dimensional systems, create the need to assess general personality disorder apart from individual differences in personality pathology. The General Assessment of Personality Disorder (GAPD) is a self-report questionnaire designed to evaluate general personality disorder. The measure evaluates 2 major components of disordered personality: self or identity problems and interpersonal dysfunction. This study explores whether there is a single factor reflecting general personality pathology as proposed by the Diagnostic and Statistical Manual of Mental Disorders (5th ed.), whether self-pathology has incremental validity over interpersonal pathology as measured by GAPD, and whether GAPD scales relate significantly to Diagnostic and Statistical Manual of Mental Disorders (4th ed. [DSM-IV]) personality disorders. Based on responses from a German psychiatric sample of 149 participants, parallel analysis yielded a 1-factor model. Self Pathology scales of the GAPD increased the predictive validity of the Interpersonal Pathology scales of the GAPD. The GAPD scales showed a moderate to high correlation for 9 of 12 DSM-IV personality disorders.

  7. MODELING A MIXTURE: PBPK/PD APPROACHES FOR PREDICTING CHEMICAL INTERACTIONS.

    EPA Science Inventory

    Since environmental chemical exposures generally involve multiple chemicals, there are both regulatory and scientific drivers to develop methods to predict outcomes of these exposures. Even using efficient statistical and experimental designs, it is not possible to test in vivo a...

  8. Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

    PubMed

    Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong

    2016-01-01

    The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).

  9. Egg production forecasting: Determining efficient modeling approaches.

    PubMed

    Ahmad, H A

    2011-12-01

    Several mathematical or statistical and artificial intelligence models were developed to compare egg production forecasts in commercial layers. Initial data for these models were collected from a comparative layer trial on commercial strains conducted at the Poultry Research Farms, Auburn University. Simulated data were produced to represent new scenarios by using means and SD of egg production of the 22 commercial strains. From the simulated data, random examples were generated for neural network training and testing for the weekly egg production prediction from wk 22 to 36. Three neural network architectures-back-propagation-3, Ward-5, and the general regression neural network-were compared for their efficiency to forecast egg production, along with other traditional models. The general regression neural network gave the best-fitting line, which almost overlapped with the commercial egg production data, with an R(2) of 0.71. The general regression neural network-predicted curve was compared with original egg production data, the average curves of white-shelled and brown-shelled strains, linear regression predictions, and the Gompertz nonlinear model. The general regression neural network was superior in all these comparisons and may be the model of choice if the initial overprediction is managed efficiently. In general, neural network models are efficient, are easy to use, require fewer data, and are practical under farm management conditions to forecast egg production.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Poyer, D.A.

    In this report, tests of statistical significance of five sets of variables with household energy consumption (at the point of end-use) are described. Five models, in sequence, were empirically estimated and tested for statistical significance by using the Residential Energy Consumption Survey of the US Department of Energy, Energy Information Administration. Each model incorporated additional information, embodied in a set of variables not previously specified in the energy demand system. The variable sets were generally labeled as economic variables, weather variables, household-structure variables, end-use variables, and housing-type variables. The tests of statistical significance showed each of the variable sets tomore » be highly significant in explaining the overall variance in energy consumption. The findings imply that the contemporaneous interaction of different types of variables, and not just one exclusive set of variables, determines the level of household energy consumption.« less

  11. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

    PubMed

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-05-15

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.

  12. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

    PubMed Central

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-01-01

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674

  13. Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model

    PubMed Central

    2013-01-01

    Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699

  14. Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System

    NASA Astrophysics Data System (ADS)

    Demirdjian, L.; Zhou, Y.; Huffman, G. J.

    2016-12-01

    This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.

  15. Framework for adaptive multiscale analysis of nonhomogeneous point processes.

    PubMed

    Helgason, Hannes; Bartroff, Jay; Abry, Patrice

    2011-01-01

    We develop the methodology for hypothesis testing and model selection in nonhomogeneous Poisson processes, with an eye toward the application of modeling and variability detection in heart beat data. Modeling the process' non-constant rate function using templates of simple basis functions, we develop the generalized likelihood ratio statistic for a given template and a multiple testing scheme to model-select from a family of templates. A dynamic programming algorithm inspired by network flows is used to compute the maximum likelihood template in a multiscale manner. In a numerical example, the proposed procedure is nearly as powerful as the super-optimal procedures that know the true template size and true partition, respectively. Extensions to general history-dependent point processes is discussed.

  16. Troposphere-stratosphere (surface-55 km) monthly winter general circulation statistics for the Northern Hemisphere Interannual variations

    NASA Technical Reports Server (NTRS)

    Geller, M. A.; Wu, M.-F.; Gelman, M. E.

    1984-01-01

    Individual monthly mean general circulation statistics for the Northern Hemisphere winters of 1978-79, 1979-80, 1980-81, and 1981-82 are examined for the altitude region from the earth's surface to 55 km. Substantial interannual variability is found in the mean zonal geostrophic wind; planetary waves with zonal wavenumber one and two; the heat and momentum fluxes; and the divergence of the Eliassen-Palm flux. These results are compared with previous studies by other workers. This variability in the monthly means is examined further by looking at both time-latitude sections at constant pressure levels and time-height sections at constant latitudes. The implications of this interannual variability for verifying models and interpreting observations are discussed.

  17. Exponentiated power Lindley distribution.

    PubMed

    Ashour, Samir K; Eltehiwy, Mahmoud A

    2015-11-01

    A new generalization of the Lindley distribution is recently proposed by Ghitany et al. [1], called as the power Lindley distribution. Another generalization of the Lindley distribution was introduced by Nadarajah et al. [2], named as the generalized Lindley distribution. This paper proposes a more generalization of the Lindley distribution which generalizes the two. We refer to this new generalization as the exponentiated power Lindley distribution. The new distribution is important since it contains as special sub-models some widely well-known distributions in addition to the above two models, such as the Lindley distribution among many others. It also provides more flexibility to analyze complex real data sets. We study some statistical properties for the new distribution. We discuss maximum likelihood estimation of the distribution parameters. Least square estimation is used to evaluate the parameters. Three algorithms are proposed for generating random data from the proposed distribution. An application of the model to a real data set is analyzed using the new distribution, which shows that the exponentiated power Lindley distribution can be used quite effectively in analyzing real lifetime data.

  18. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

    PubMed

    Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

    2010-07-19

    A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.

  19. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    PubMed Central

    2010-01-01

    Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827

  20. Analysis of Flow and Transport in non-Gaussian Heterogeneous Formations Using a Generalized Sub-Gaussian Model

    NASA Astrophysics Data System (ADS)

    Guadagnini, A.; Riva, M.; Neuman, S. P.

    2016-12-01

    Environmental quantities such as log hydraulic conductivity (or transmissivity), Y(x) = ln K(x), and their spatial (or temporal) increments, ΔY, are known to be generally non-Gaussian. Documented evidence of such behavior includes symmetry of increment distributions at all separation scales (or lags) between incremental values of Y with sharp peaks and heavy tails that decay asymptotically as lag increases. This statistical scaling occurs in porous as well as fractured media characterized by either one or a hierarchy of spatial correlation scales. In hierarchical media one observes a range of additional statistical ΔY scaling phenomena, all of which are captured comprehensibly by a novel generalized sub-Gaussian (GSG) model. In this model Y forms a mixture Y(x) = U(x) G(x) of single- or multi-scale Gaussian processes G having random variances, U being a non-negative subordinator independent of G. Elsewhere we developed ways to generate unconditional and conditional random realizations of isotropic or anisotropic GSG fields which can be embedded in numerical Monte Carlo flow and transport simulations. Here we present and discuss expressions for probability distribution functions of Y and ΔY as well as their lead statistical moments. We then focus on a simple flow setting of mean uniform steady state flow in an unbounded, two-dimensional domain, exploring ways in which non-Gaussian heterogeneity affects stochastic flow and transport descriptions. Our expressions represent (a) lead order autocovariance and cross-covariance functions of hydraulic head, velocity and advective particle displacement as well as (b) analogues of preasymptotic and asymptotic Fickian dispersion coefficients. We compare them with corresponding expressions developed in the literature for Gaussian Y.

  1. Can hepatic resection provide a long-term cure for patients with intrahepatic cholangiocarcinoma?

    PubMed

    Spolverato, Gaya; Vitale, Alessandro; Cucchetti, Alessandro; Popescu, Irinel; Marques, Hugo P; Aldrighetti, Luca; Gamblin, T Clark; Maithel, Shishir K; Sandroussi, Charbel; Bauer, Todd W; Shen, Feng; Poultsides, George A; Marsh, J Wallis; Pawlik, Timothy M

    2015-11-15

    A patient can be considered statistically cured from a specific disease when their mortality rate returns to the same level as that of the general population. In the current study, the authors sought to assess the probability of being statistically cured from intrahepatic cholangiocarcinoma (ICC) by hepatic resection. A total of 584 patients who underwent surgery with curative intent for ICC between 1990 and 2013 at 1 of 12 participating institutions were identified. A nonmixture cure model was adopted to compare mortality after hepatic resection with the mortality expected for the general population matched by sex and age. The median, 1-year, 3-year, and 5-year disease-free survival was 10 months, 44%, 18%, and 11%, respectively; the corresponding overall survival was 27 months, 75%, 37%, and 22%, respectively. The probability of being cured of ICC was 9.7% (95% confidence interval, 6.1%-13.4%). The mortality of patients undergoing surgery for ICC was higher than that of the general population until year 10, at which time patients alive without tumor recurrence can be considered cured with 99% certainty. Multivariate analysis demonstrated that cure probabilities ranged from 25.8% (time to cure, 9.8 years) in patients with a single, well-differentiated ICC measuring ≤5 cm that was without vascular/periductal invasion and lymph nodes metastases versus <0.1% (time to cure, 12.6 years) among patients with all 6 of these risk factors. A model with which to calculate cure fraction and time to cure was developed. The cure model indicated that statistical cure was possible in patients undergoing hepatic resection for ICC. The overall probability of cure was approximately 10% and varied based on several tumor-specific factors. Cancer 2015;121:3998-4006. © 2015 American Cancer Society. © 2015 American Cancer Society.

  2. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  3. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  4. Correcting Too Much or Too Little? The Performance of Three Chi-Square Corrections.

    PubMed

    Foldnes, Njål; Olsson, Ulf Henning

    2015-01-01

    This simulation study investigates the performance of three test statistics, T1, T2, and T3, used to evaluate structural equation model fit under non normal data conditions. T1 is the well-known mean-adjusted statistic of Satorra and Bentler. T2 is the mean-and-variance adjusted statistic of Sattertwaithe type where the degrees of freedom is manipulated. T3 is a recently proposed version of T2 that does not manipulate degrees of freedom. Discrepancies between these statistics and their nominal chi-square distribution in terms of errors of Type I and Type II are investigated. All statistics are shown to be sensitive to increasing kurtosis in the data, with Type I error rates often far off the nominal level. Under excess kurtosis true models are generally over-rejected by T1 and under-rejected by T2 and T3, which have similar performance in all conditions. Under misspecification there is a loss of power with increasing kurtosis, especially for T2 and T3. The coefficient of variation of the nonzero eigenvalues of a certain matrix is shown to be a reliable indicator for the adequacy of these statistics.

  5. Reconciling statistical and systems science approaches to public health.

    PubMed

    Ip, Edward H; Rahmandad, Hazhir; Shoham, David A; Hammond, Ross; Huang, Terry T-K; Wang, Youfa; Mabry, Patricia L

    2013-10-01

    Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision.

  6. Reconciling Statistical and Systems Science Approaches to Public Health

    PubMed Central

    Ip, Edward H.; Rahmandad, Hazhir; Shoham, David A.; Hammond, Ross; Huang, Terry T.-K.; Wang, Youfa; Mabry, Patricia L.

    2016-01-01

    Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision. PMID:24084395

  7. Quantile regression for the statistical analysis of immunological data with many non-detects.

    PubMed

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  8. Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

    PubMed

    Yin, Guosheng; Ma, Yanyuan

    2013-01-01

    The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.

  9. Low-complexity stochastic modeling of wall-bounded shear flows

    NASA Astrophysics Data System (ADS)

    Zare, Armin

    Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.

  10. Improving Domain-specific Machine Translation by Constraining the Language Model

    DTIC Science & Technology

    2012-07-01

    performance. To make up for the lack of parallel training data, one assumption is that more monolingual target language data should be used in building the...target language model. Prior work on domain-specific MT has focused on training target language models with monolingual 2 domain-specific data...showed that the using a large dictionary extracted from medical domain documents in a statistical MT system to generalize the training data significantly

  11. Demonstration of fundamental statistics by studying timing of electronics signals in a physics-based laboratory

    NASA Astrophysics Data System (ADS)

    Beach, Shaun E.; Semkow, Thomas M.; Remling, David J.; Bradt, Clayton J.

    2017-07-01

    We have developed accessible methods to demonstrate fundamental statistics in several phenomena, in the context of teaching electronic signal processing in a physics-based college-level curriculum. A relationship between the exponential time-interval distribution and Poisson counting distribution for a Markov process with constant rate is derived in a novel way and demonstrated using nuclear counting. Negative binomial statistics is demonstrated as a model for overdispersion and justified by the effect of electronic noise in nuclear counting. The statistics of digital packets on a computer network are shown to be compatible with the fractal-point stochastic process leading to a power-law as well as generalized inverse Gaussian density distributions of time intervals between packets.

  12. Loop Braiding Statistics and Interacting Fermionic Symmetry-Protected Topological Phases in Three Dimensions

    NASA Astrophysics Data System (ADS)

    Cheng, Meng; Tantivasadakarn, Nathanan; Wang, Chenjie

    2018-01-01

    We study Abelian braiding statistics of loop excitations in three-dimensional gauge theories with fermionic particles and the closely related problem of classifying 3D fermionic symmetry-protected topological (FSPT) phases with unitary symmetries. It is known that the two problems are related by turning FSPT phases into gauge theories through gauging the global symmetry of the former. We show that there exist certain types of Abelian loop braiding statistics that are allowed only in the presence of fermionic particles, which correspond to 3D "intrinsic" FSPT phases, i.e., those that do not stem from bosonic SPT phases. While such intrinsic FSPT phases are ubiquitous in 2D systems and in 3D systems with antiunitary symmetries, their existence in 3D systems with unitary symmetries was not confirmed previously due to the fact that strong interaction is necessary to realize them. We show that the simplest unitary symmetry to support 3D intrinsic FSPT phases is Z2×Z4. To establish the results, we first derive a complete set of physical constraints on Abelian loop braiding statistics. Solving the constraints, we obtain all possible Abelian loop braiding statistics in 3D gauge theories, including those that correspond to intrinsic FSPT phases. Then, we construct exactly soluble state-sum models to realize the loop braiding statistics. These state-sum models generalize the well-known Crane-Yetter and Dijkgraaf-Witten models.

  13. Physical concepts in the development of constitutive equations

    NASA Technical Reports Server (NTRS)

    Cassenti, B. N.

    1985-01-01

    Proposed viscoplastic material models include in their formulation observed material response but do not generally incorporate principles from thermodynamics, statistical mechanics, and quantum mechanics. Numerous hypotheses were made for material response based on first principles. Many of these hypotheses were tested experimentally. The proposed viscoplastic theories and the experimental basis of these hypotheses must be checked against the hypotheses. The physics of thermodynamics, statistical mechanics and quantum mechanics, and the effects of defects, are reviewed for their application to the development of constitutive laws.

  14. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  15. Variational Bayesian Parameter Estimation Techniques for the General Linear Model

    PubMed Central

    Starke, Ludger; Ostwald, Dirk

    2017-01-01

    Variational Bayes (VB), variational maximum likelihood (VML), restricted maximum likelihood (ReML), and maximum likelihood (ML) are cornerstone parametric statistical estimation techniques in the analysis of functional neuroimaging data. However, the theoretical underpinnings of these model parameter estimation techniques are rarely covered in introductory statistical texts. Because of the widespread practical use of VB, VML, ReML, and ML in the neuroimaging community, we reasoned that a theoretical treatment of their relationships and their application in a basic modeling scenario may be helpful for both neuroimaging novices and practitioners alike. In this technical study, we thus revisit the conceptual and formal underpinnings of VB, VML, ReML, and ML and provide a detailed account of their mathematical relationships and implementational details. We further apply VB, VML, ReML, and ML to the general linear model (GLM) with non-spherical error covariance as commonly encountered in the first-level analysis of fMRI data. To this end, we explicitly derive the corresponding free energy objective functions and ensuing iterative algorithms. Finally, in the applied part of our study, we evaluate the parameter and model recovery properties of VB, VML, ReML, and ML, first in an exemplary setting and then in the analysis of experimental fMRI data acquired from a single participant under visual stimulation. PMID:28966572

  16. Enhanced Sensitivity to Rapid Input Fluctuations by Nonlinear Threshold Dynamics in Neocortical Pyramidal Neurons.

    PubMed

    Mensi, Skander; Hagens, Olivier; Gerstner, Wulfram; Pozzorini, Christian

    2016-02-01

    The way in which single neurons transform input into output spike trains has fundamental consequences for network coding. Theories and modeling studies based on standard Integrate-and-Fire models implicitly assume that, in response to increasingly strong inputs, neurons modify their coding strategy by progressively reducing their selective sensitivity to rapid input fluctuations. Combining mathematical modeling with in vitro experiments, we demonstrate that, in L5 pyramidal neurons, the firing threshold dynamics adaptively adjust the effective timescale of somatic integration in order to preserve sensitivity to rapid signals over a broad range of input statistics. For that, a new Generalized Integrate-and-Fire model featuring nonlinear firing threshold dynamics and conductance-based adaptation is introduced that outperforms state-of-the-art neuron models in predicting the spiking activity of neurons responding to a variety of in vivo-like fluctuating currents. Our model allows for efficient parameter extraction and can be analytically mapped to a Generalized Linear Model in which both the input filter--describing somatic integration--and the spike-history filter--accounting for spike-frequency adaptation--dynamically adapt to the input statistics, as experimentally observed. Overall, our results provide new insights on the computational role of different biophysical processes known to underlie adaptive coding in single neurons and support previous theoretical findings indicating that the nonlinear dynamics of the firing threshold due to Na+-channel inactivation regulate the sensitivity to rapid input fluctuations.

  17. Unitary n -designs via random quenches in atomic Hubbard and spin models: Application to the measurement of Rényi entropies

    NASA Astrophysics Data System (ADS)

    Vermersch, B.; Elben, A.; Dalmonte, M.; Cirac, J. I.; Zoller, P.

    2018-02-01

    We present a general framework for the generation of random unitaries based on random quenches in atomic Hubbard and spin models, forming approximate unitary n -designs, and their application to the measurement of Rényi entropies. We generalize our protocol presented in Elben et al. [Phys. Rev. Lett. 120, 050406 (2018), 10.1103/PhysRevLett.120.050406] to a broad class of atomic and spin-lattice models. We further present an in-depth numerical and analytical study of experimental imperfections, including the effect of decoherence and statistical errors, and discuss connections of our approach with many-body quantum chaos.

  18. The Evaluation of Bivariate Mixed Models in Meta-analyses of Diagnostic Accuracy Studies with SAS, Stata and R.

    PubMed

    Vogelgesang, Felicitas; Schlattmann, Peter; Dewey, Marc

    2018-05-01

    Meta-analyses require a thoroughly planned procedure to obtain unbiased overall estimates. From a statistical point of view not only model selection but also model implementation in the software affects the results. The present simulation study investigates the accuracy of different implementations of general and generalized bivariate mixed models in SAS (using proc mixed, proc glimmix and proc nlmixed), Stata (using gllamm, xtmelogit and midas) and R (using reitsma from package mada and glmer from package lme4). Both models incorporate the relationship between sensitivity and specificity - the two outcomes of interest in meta-analyses of diagnostic accuracy studies - utilizing random effects. Model performance is compared in nine meta-analytic scenarios reflecting the combination of three sizes for meta-analyses (89, 30 and 10 studies) with three pairs of sensitivity/specificity values (97%/87%; 85%/75%; 90%/93%). The evaluation of accuracy in terms of bias, standard error and mean squared error reveals that all implementations of the generalized bivariate model calculate sensitivity and specificity estimates with deviations less than two percentage points. proc mixed which together with reitsma implements the general bivariate mixed model proposed by Reitsma rather shows convergence problems. The random effect parameters are in general underestimated. This study shows that flexibility and simplicity of model specification together with convergence robustness should influence implementation recommendations, as the accuracy in terms of bias was acceptable in all implementations using the generalized approach. Schattauer GmbH.

  19. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY.

    PubMed

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2009-08-07

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application.

  20. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY

    PubMed Central

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2010-01-01

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application. PMID:21197416

  1. Vehicular headways on signalized intersections: theory, models, and reality

    NASA Astrophysics Data System (ADS)

    Krbálek, Milan; Šleis, Jiří

    2015-01-01

    We discuss statistical properties of vehicular headways measured on signalized crossroads. On the basis of mathematical approaches, we formulate theoretical and empirically inspired criteria for the acceptability of theoretical headway distributions. Sequentially, the multifarious families of statistical distributions (commonly used to fit real-road headway statistics) are confronted with these criteria, and with original empirical time clearances gauged among neighboring vehicles leaving signal-controlled crossroads after a green signal appears. Using three different numerical schemes, we demonstrate that an arrangement of vehicles on an intersection is a consequence of the general stochastic nature of queueing systems, rather than a consequence of traffic rules, driver estimation processes, or decision-making procedures.

  2. Quantifying risks with exact analytical solutions of derivative pricing distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Kun; Liu, Jing; Wang, Erkang; Wang, Jin

    2017-04-01

    Derivative (i.e. option) pricing is essential for modern financial instrumentations. Despite of the previous efforts, the exact analytical forms of the derivative pricing distributions are still challenging to obtain. In this study, we established a quantitative framework using path integrals to obtain the exact analytical solutions of the statistical distribution for bond and bond option pricing for the Vasicek model. We discuss the importance of statistical fluctuations away from the expected option pricing characterized by the distribution tail and their associations to value at risk (VaR). The framework established here is general and can be applied to other financial derivatives for quantifying the underlying statistical distributions.

  3. On the implications of the classical ergodic theorems: analysis of developmental processes has to focus on intra-individual variation.

    PubMed

    Molenaar, Peter C M

    2008-01-01

    It is argued that general mathematical-statistical theorems imply that standard statistical analysis techniques of inter-individual variation are invalid to investigate developmental processes. Developmental processes have to be analyzed at the level of individual subjects, using time series data characterizing the patterns of intra-individual variation. It is shown that standard statistical techniques based on the analysis of inter-individual variation appear to be insensitive to the presence of arbitrary large degrees of inter-individual heterogeneity in the population. An important class of nonlinear epigenetic models of neural growth is described which can explain the occurrence of such heterogeneity in brain structures and behavior. Links with models of developmental instability are discussed. A simulation study based on a chaotic growth model illustrates the invalidity of standard analysis of inter-individual variation, whereas time series analysis of intra-individual variation is able to recover the true state of affairs. (c) 2007 Wiley Periodicals, Inc.

  4. Multivariate statistical model for 3D image segmentation with application to medical images.

    PubMed

    John, Nigel M; Kabuka, Mansur R; Ibrahim, Mohamed O

    2003-12-01

    In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).

  5. Systematic Review and Meta-Analysis of Studies Evaluating Diagnostic Test Accuracy: A Practical Review for Clinical Researchers-Part II. Statistical Methods of Meta-Analysis

    PubMed Central

    Lee, Juneyoung; Kim, Kyung Won; Choi, Sang Hyun; Huh, Jimi

    2015-01-01

    Meta-analysis of diagnostic test accuracy studies differs from the usual meta-analysis of therapeutic/interventional studies in that, it is required to simultaneously analyze a pair of two outcome measures such as sensitivity and specificity, instead of a single outcome. Since sensitivity and specificity are generally inversely correlated and could be affected by a threshold effect, more sophisticated statistical methods are required for the meta-analysis of diagnostic test accuracy. Hierarchical models including the bivariate model and the hierarchical summary receiver operating characteristic model are increasingly being accepted as standard methods for meta-analysis of diagnostic test accuracy studies. We provide a conceptual review of statistical methods currently used and recommended for meta-analysis of diagnostic test accuracy studies. This article could serve as a methodological reference for those who perform systematic review and meta-analysis of diagnostic test accuracy studies. PMID:26576107

  6. Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2013-09-01

    The research aims to develop global modeling tools capable of categorizing structurally diverse chemicals in various toxicity classes according to the EEC and European Community directives, and to predict their acute toxicity in fathead minnow using set of selected molecular descriptors. Accordingly, artificial intelligence approach based classification and regression models, such as probabilistic neural networks (PNN), generalized regression neural networks (GRNN), multilayer perceptron neural network (MLPN), radial basis function neural network (RBFN), support vector machines (SVM), gene expression programming (GEP), and decision tree (DT) were constructed using the experimental toxicity data. Diversity and non-linearity in the chemicals' data were tested using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. Predictive and generalization abilities of various models constructed here were compared using several statistical parameters. PNN and GRNN models performed relatively better than MLPN, RBFN, SVM, GEP, and DT. Both in two and four category classifications, PNN yielded a considerably high accuracy of classification in training (95.85 percent and 90.07 percent) and validation data (91.30 percent and 86.96 percent), respectively. GRNN rendered a high correlation between the measured and model predicted -log LC50 values both for the training (0.929) and validation (0.910) data and low prediction errors (RMSE) of 0.52 and 0.49 for two sets. Efficiency of the selected PNN and GRNN models in predicting acute toxicity of new chemicals was adequately validated using external datasets of different fish species (fathead minnow, bluegill, trout, and guppy). The PNN and GRNN models showed good predictive and generalization abilities and can be used as tools for predicting toxicities of structurally diverse chemical compounds. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Empirical investigation into depth-resolution of Magnetotelluric data

    NASA Astrophysics Data System (ADS)

    Piana Agostinetti, N.; Ogaya, X.

    2017-12-01

    We investigate the depth-resolution of MT data comparing reconstructed 1D resistivity profiles with measured resistivity and lithostratigraphy from borehole data. Inversion of MT data has been widely used to reconstruct the 1D fine-layered resistivity structure beneath an isolated Magnetotelluric (MT) station. Uncorrelated noise is generally assumed to be associated to MT data. However, wrong assumptions on error statistics have been proved to strongly bias the results obtained in geophysical inversions. In particular the number of resolved layers at depth strongly depends on error statistics. In this study, we applied a trans-dimensional McMC algorithm for reconstructing the 1D resistivity profile near-by the location of a 1500 m-deep borehole, using MT data. We resolve the MT inverse problem imposing different models for the error statistics associated to the MT data. Following a Hierachical Bayes' approach, we also inverted for the hyper-parameters associated to each error statistics model. Preliminary results indicate that assuming un-correlated noise leads to a number of resolved layers larger than expected from the retrieved lithostratigraphy. Moreover, comparing the inversion of synthetic resistivity data obtained from the "true" resistivity stratification measured along the borehole shows that a consistent number of resistivity layers can be obtained using a Gaussian model for the error statistics, with substantial correlation length.

  8. General aviation activity and avionics survey. 1978. Annual summary report cy 1978

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schwenk, J.C.

    1980-03-01

    This report presents the results and a description of the 1978 General Aviation Activity and Avionics Survey. The survey was conducted during early 1979 by the FAA to obtain information on the activity and avionics of the United States registered general aviation aircraft fleet, the dominant component of civil aviation in the U.S. The survey was based on a statistically selected sample of about 13.3 percent of the general aviation fleet and obtained a response rate of 74 percent. Survey results are based upon responses but are expanded upward to represent the total population. Survey results revealed that during 1978more » an estimated 39.4 million hours of flying time were logged by the 198,778 active general aviation aircraft in the U.S. fleet, yielding a mean annual flight time per aircraft of 197.7 hours. The active aircraft represented 85 percent of the registered general aviation fleet. The report contains breakdowns of these and other statistics by manufacturer/model group, aircraft type, state and region of based aircraft, and primary use. Also included are fuel consumption, lifetime airframe hours, avionics, and engine hours estimates.« less

  9. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  10. The Impact of New Technology on Accounting Education.

    ERIC Educational Resources Information Center

    Shaoul, Jean

    The introduction of computers in the Department of Accounting and Finance at Manchester University is described. General background outlining the increasing need for microcomputers in the accounting curriculum (including financial modelling tools and decision support systems such as linear programming, statistical packages, and simulation) is…

  11. Landowner interest in multifunctional agroforestry riparian buffers.

    Treesearch

    Katie Trozzo; John Munsell; James Chamberlain

    2014-01-01

    Adoption of temperate agroforestry practices generally remains limited despite considerable advances in basic science. This study builds on temperate agroforestry adoption research by empirically testing a statistical model of interest in native fruit and nut tree riparian buffers using technology and agroforestry adoption theory. Data...

  12. Bayesian truncation errors in chiral effective field theory: model checking and accounting for correlations

    NASA Astrophysics Data System (ADS)

    Melendez, Jordan; Wesolowski, Sarah; Furnstahl, Dick

    2017-09-01

    Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. A Bayesian model yields posterior probability distribution functions for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. As a general example of a statistical approach to truncation errors, the model was applied to chiral EFT for neutron-proton scattering using various semi-local potentials of Epelbaum, Krebs, and Meißner (EKM). Here we discuss how our model can learn correlation information from the data and how to perform Bayesian model checking to validate that the EFT is working as advertised. Supported in part by NSF PHY-1614460 and DOE NUCLEI SciDAC DE-SC0008533.

  13. Probabilistic registration of an unbiased statistical shape model to ultrasound images of the spine

    NASA Astrophysics Data System (ADS)

    Rasoulian, Abtin; Rohling, Robert N.; Abolmaesumi, Purang

    2012-02-01

    The placement of an epidural needle is among the most difficult regional anesthetic techniques. Ultrasound has been proposed to improve success of placement. However, it has not become the standard-of-care because of limitations in the depictions and interpretation of the key anatomical features. We propose to augment the ultrasound images with a registered statistical shape model of the spine to aid interpretation. The model is created with a novel deformable group-wise registration method which utilizes a probabilistic approach to register groups of point sets. The method is compared to a volume-based model building technique and it demonstrates better generalization and compactness. We instantiate and register the shape model to a spine surface probability map extracted from the ultrasound images. Validation is performed on human subjects. The achieved registration accuracy (2-4 mm) is sufficient to guide the choice of puncture site and trajectory of an epidural needle.

  14. Advances in Statistical and Deterministic Modeling of Wind-Driven Seas

    DTIC Science & Technology

    2011-09-30

    Zakharov. Scales of nonlinear relaxation and balance of wind- driven seas. Geophysical Research Abstracts Vol. 13, EGU2011-2042, 2011. EGU General ...Dyachenko A. “On canonical equation for water waves” at General Assembly 2011 of the European Geosciences Union in Vienna, Austria, 03 – 08 April...scattering and equilibrium ranges in wind- generated waves with application to spectrometry, J. Geoph. Res., 92, 49715029, 1987. [3] Hsiao S.V. and

  15. Phylogenetic relationships of South American lizards of the genus Stenocercus (Squamata: Iguania): A new approach using a general mixture model for gene sequence data.

    PubMed

    Torres-Carvajal, Omar; Schulte, James A; Cadle, John E

    2006-04-01

    The South American iguanian lizard genus Stenocercus includes 54 species occurring mostly in the Andes and adjacent lowland areas from northern Venezuela and Colombia to central Argentina at elevations of 0-4000m. Small taxon or character sampling has characterized all phylogenetic analyses of Stenocercus, which has long been recognized as sister taxon to the Tropidurus Group. In this study, we use mtDNA sequence data to perform phylogenetic analyses that include 32 species of Stenocercus and 12 outgroup taxa. Monophyly of this genus is strongly supported by maximum parsimony and Bayesian analyses. Evolutionary relationships within Stenocercus are further analyzed with a Bayesian implementation of a general mixture model, which accommodates variability in the pattern of evolution across sites. These analyses indicate a basal split of Stenocercus into two clades, one of which receives very strong statistical support. In addition, we test previous hypotheses using non-parametric and parametric statistical methods, and provide a phylogenetic classification for Stenocercus.

  16. A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds.

    PubMed

    de Jong, Maarten; Chen, Wei; Notestine, Randy; Persson, Kristin; Ceder, Gerbrand; Jain, Anubhav; Asta, Mark; Gamst, Anthony

    2016-10-03

    Materials scientists increasingly employ machine or statistical learning (SL) techniques to accelerate materials discovery and design. Such pursuits benefit from pooling training data across, and thus being able to generalize predictions over, k-nary compounds of diverse chemistries and structures. This work presents a SL framework that addresses challenges in materials science applications, where datasets are diverse but of modest size, and extreme values are often of interest. Our advances include the application of power or Hölder means to construct descriptors that generalize over chemistry and crystal structure, and the incorporation of multivariate local regression within a gradient boosting framework. The approach is demonstrated by developing SL models to predict bulk and shear moduli (K and G, respectively) for polycrystalline inorganic compounds, using 1,940 compounds from a growing database of calculated elastic moduli for metals, semiconductors and insulators. The usefulness of the models is illustrated by screening for superhard materials.

  17. A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds

    PubMed Central

    de Jong, Maarten; Chen, Wei; Notestine, Randy; Persson, Kristin; Ceder, Gerbrand; Jain, Anubhav; Asta, Mark; Gamst, Anthony

    2016-01-01

    Materials scientists increasingly employ machine or statistical learning (SL) techniques to accelerate materials discovery and design. Such pursuits benefit from pooling training data across, and thus being able to generalize predictions over, k-nary compounds of diverse chemistries and structures. This work presents a SL framework that addresses challenges in materials science applications, where datasets are diverse but of modest size, and extreme values are often of interest. Our advances include the application of power or Hölder means to construct descriptors that generalize over chemistry and crystal structure, and the incorporation of multivariate local regression within a gradient boosting framework. The approach is demonstrated by developing SL models to predict bulk and shear moduli (K and G, respectively) for polycrystalline inorganic compounds, using 1,940 compounds from a growing database of calculated elastic moduli for metals, semiconductors and insulators. The usefulness of the models is illustrated by screening for superhard materials. PMID:27694824

  18. A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds

    DOE PAGES

    de Jong, Maarten; Chen, Wei; Notestine, Randy; ...

    2016-10-03

    Materials scientists increasingly employ machine or statistical learning (SL) techniques to accelerate materials discovery and design. Such pursuits benefit from pooling training data across, and thus being able to generalize predictions over, k-nary compounds of diverse chemistries and structures. This work presents a SL framework that addresses challenges in materials science applications, where datasets are diverse but of modest size, and extreme values are often of interest. Our advances include the application of power or Hölder means to construct descriptors that generalize over chemistry and crystal structure, and the incorporation of multivariate local regression within a gradient boosting framework. Themore » approach is demonstrated by developing SL models to predict bulk and shear moduli (K and G, respectively) for polycrystalline inorganic compounds, using 1,940 compounds from a growing database of calculated elastic moduli for metals, semiconductors and insulators. The usefulness of the models is illustrated by screening for superhard materials.« less

  19. Modeling the Test-Retest Statistics of a Localization Experiment in the Full Horizontal Plane.

    PubMed

    Morsnowski, André; Maune, Steffen

    2016-10-01

    Two approaches to model the test-retest statistics of a localization experiment basing on Gaussian distribution and on surrogate data are introduced. Their efficiency is investigated using different measures describing directional hearing ability. A localization experiment in the full horizontal plane is a challenging task for hearing impaired patients. In clinical routine, we use this experiment to evaluate the progress of our cochlear implant (CI) recipients. Listening and time effort limit the reproducibility. The localization experiment consists of a 12 loudspeaker circle, placed in an anechoic room, a "camera silens". In darkness, HSM sentences are presented at 65 dB pseudo-erratically from all 12 directions with five repetitions. This experiment is modeled by a set of Gaussian distributions with different standard deviations added to a perfect estimator, as well as by surrogate data. Five repetitions per direction are used to produce surrogate data distributions for the sensation directions. To investigate the statistics, we retrospectively use the data of 33 CI patients with 92 pairs of test-retest-measurements from the same day. The first model does not take inversions into account, (i.e., permutations of the direction from back to front and vice versa are not considered), although they are common for hearing impaired persons particularly in the rear hemisphere. The second model considers these inversions but does not work with all measures. The introduced models successfully describe test-retest statistics of directional hearing. However, since their applications on the investigated measures perform differently no general recommendation can be provided. The presented test-retest statistics enable pair test comparisons for localization experiments.

  20. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    PubMed

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  1. Genetic algorithm dynamics on a rugged landscape

    NASA Astrophysics Data System (ADS)

    Bornholdt, Stefan

    1998-04-01

    The genetic algorithm is an optimization procedure motivated by biological evolution and is successfully applied to optimization problems in different areas. A statistical mechanics model for its dynamics is proposed based on the parent-child fitness correlation of the genetic operators, making it applicable to general fitness landscapes. It is compared to a recent model based on a maximum entropy ansatz. Finally it is applied to modeling the dynamics of a genetic algorithm on the rugged fitness landscape of the NK model.

  2. PharmML in Action: an Interoperable Language for Modeling and Simulation

    PubMed Central

    Bizzotto, R; Smith, G; Yvon, F; Kristensen, NR; Swat, MJ

    2017-01-01

    PharmML1 is an XML‐based exchange format2, 3, 4 created with a focus on nonlinear mixed‐effect (NLME) models used in pharmacometrics,5, 6 but providing a very general framework that also allows describing mathematical and statistical models such as single‐subject or nonlinear and multivariate regression models. This tutorial provides an overview of the structure of this language, brief suggestions on how to work with it, and use cases demonstrating its power and flexibility. PMID:28575551

  3. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays.

    PubMed

    Lee, Mei-Ling Ting; Bulyk, Martha L; Whitmore, G A; Church, George M

    2002-12-01

    There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.

  4. Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.

    PubMed

    Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa

    2017-01-01

    TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.

  5. Beyond heat baths II: framework for generalized thermodynamic resource theories

    NASA Astrophysics Data System (ADS)

    Yunger Halpern, Nicole

    2018-03-01

    Thermodynamics, which describes vast systems, has been reconciled with small scales, relevant to single-molecule experiments, in resource theories. Resource theories have been used to model exchanges of energy and information. Recently, particle exchanges were modeled; and an umbrella family of thermodynamic resource theories was proposed to model diverse baths, interactions, and free energies. This paper motivates and details the family’s structure and prospective applications. How to model electrochemical, gravitational, magnetic, and other thermodynamic systems is explained. Szilárd’s engine and Landauer’s Principle are generalized, as resourcefulness is shown to be convertible not only between information and gravitational energy, but also among diverse degrees of freedom. Extensive variables are associated with quantum operators that might fail to commute, introducing extra nonclassicality into thermodynamic resource theories. An early version of this paper partially motivated the later development of noncommutative thermalization. This generalization expands the theories’ potential for modeling realistic systems with which small-scale statistical mechanics might be tested experimentally.

  6. Learning planar Ising models

    DOE PAGES

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...

    2016-12-01

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  7. Learning planar Ising models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  8. A survey of statistics in three UK general practice journal

    PubMed Central

    Rigby, Alan S; Armstrong, Gillian K; Campbell, Michael J; Summerton, Nick

    2004-01-01

    Background Many medical specialities have reviewed the statistical content of their journals. To our knowledge this has not been done in general practice. Given the main role of a general practitioner as a diagnostician we thought it would be of interest to see whether the statistical methods reported reflect the diagnostic process. Methods Hand search of three UK journals of general practice namely the British Medical Journal (general practice section), British Journal of General Practice and Family Practice over a one-year period (1 January to 31 December 2000). Results A wide variety of statistical techniques were used. The most common methods included t-tests and Chi-squared tests. There were few articles reporting likelihood ratios and other useful diagnostic methods. There was evidence that the journals with the more thorough statistical review process reported a more complex and wider variety of statistical techniques. Conclusions The BMJ had a wider range and greater diversity of statistical methods than the other two journals. However, in all three journals there was a dearth of papers reflecting the diagnostic process. Across all three journals there were relatively few papers describing randomised controlled trials thus recognising the difficulty of implementing this design in general practice. PMID:15596014

  9. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. PDF modeling of near-wall turbulent flows

    NASA Astrophysics Data System (ADS)

    Dreeben, Thomas David

    1997-06-01

    Pdf methods are extended to include modeling of wall- bounded turbulent flows. For flows in which resolution of the viscous sublayer is desired, a Pdf near-wall model is developed in which the Generalized Langevin model is combined with an exact model for viscous transport. Durbin's method of elliptic relaxation is used to incorporate the wall effects into the governing equations without the use of wall functions or damping functions. Close to the wall, the Generalized Langevin model provides an analogy to the effect of the fluctuating continuity equation. This enables accurate modeling of the near-wall turbulent statistics. Demonstrated accuracy for fully-developed channel flow is achieved with a Pdf/Monte Carlo simulation, and with its related Reynolds-stress closure. For flows in which the details of the viscous sublayer are not important, a Pdf wall- function method is developed with the Simplified Langevin model.

  11. Exercise in prevention and treatment of anxiety and depression among children and young people.

    PubMed

    Larun, L; Nordheim, L V; Ekeland, E; Hagen, K B; Heian, F

    2006-07-19

    Depression and anxiety are common psychological disorders for children and adolescents. Psychological (e.g. psychotherapy), psychosocial (e.g. cognitive behavioral therapy) and biological (e.g. SSRIs or tricyclic drugs) treatments are the most common treatments being offered. The large variety of therapeutic interventions give rise to questions of clinical effectiveness and side effects. Physical exercise is inexpensive with few, if any, side effects. To assess the effects of exercise interventions in reducing or preventing anxiety or depression in children and young people up to 20 years of age. We searched the Cochrane Controlled Trials Register (latest issue available), MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC and Sportdiscus up to August 2005. Randomised trials of vigorous exercise interventions for children and young people up to the age of 20, with outcome measures for depression and anxiety. Two authors independently selected trials for inclusion, assessed methodological quality and extracted data. The trials were combined using meta-analysis methods. A narrative synthesis was performed when the reported data did not allow statistical pooling. Sixteen studies with a total of 1191 participants between 11 and 19 years of age were included.Eleven trials compared vigourous exercise versus no intervention in a general population of children. Six studies reporting anxiety scores showed a non-significant trend in favour of the exercise group (standard mean difference (SMD) (random effects model) -0.48, 95% confidence interval (CI) -0.97 to 0.01). Five studies reporting depression scores showed a statistically significant difference in favour of the exercise group (SMD (random effects model) -0.66, 95% CI -1.25 to -0.08). However, all trials were generally of low methodological quality and they were highly heterogeneous with regard to the population, intervention and measurement instruments used. One small trial investigated children in treatment showed no statistically significant difference in depression scores in favour of the control group (SMD (fixed effects model) 0.78, 95% CI -0.47 to 2.04). No studies reported anxiety scores for children in treatment. Five trials comparing vigorous exercise to low intensity exercise show no statistically significant difference in depression and anxiety scores in the general population of children. Three trials reported anxiety scores (SMD (fixed effects model) -0.14, 95% CI -0.41 to 0.13). Two trials reported depression scores (SMD (fixed effects model) -0.15, 95% CI -0.44 to 0.14). Two small trials found no difference in depression scores for children in treatment (SMD (fixed effects model) -0.31, 95% CI -0.78 to 0.16). No studies reported anxiety scores for children in treatment. Four trials comparing exercise with psychosocial interventions showed no statistically significant difference in depression and anxiety scores in the general population of children. Two trials reported anxiety scores (SMD (fixed effects model) -0.13, 95% CI -0.43 to 0.17). Two trials reported depression scores (SMD (fixed effects model) 0.10, 95% CI-0.21 to 0.41). One trial found no difference in depression scores for children in treatment (SMD (fixed effects model) -0.31, 95% CI -0.97 to 0.35). No studies reported anxiety scores for children in treatment. Whilst there appears to be a small effect in favour of exercise in reducing depression and anxiety scores in the general population of children and adolescents, the small number of studies included and the clinical diversity of participants, interventions and methods of measurement limit the ability to draw conclusions. It makes little difference whether the exercise is of high or low intensity. The effect of exercise for children in treatment for anxiety and depression is unknown as the evidence base is scarce.

  12. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

    PubMed

    Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

    2017-03-01

    Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.

  13. A general model-based design of experiments approach to achieve practical identifiability of pharmacokinetic and pharmacodynamic models.

    PubMed

    Galvanin, Federico; Ballan, Carlo C; Barolo, Massimiliano; Bezzo, Fabrizio

    2013-08-01

    The use of pharmacokinetic (PK) and pharmacodynamic (PD) models is a common and widespread practice in the preliminary stages of drug development. However, PK-PD models may be affected by structural identifiability issues intrinsically related to their mathematical formulation. A preliminary structural identifiability analysis is usually carried out to check if the set of model parameters can be uniquely determined from experimental observations under the ideal assumptions of noise-free data and no model uncertainty. However, even for structurally identifiable models, real-life experimental conditions and model uncertainty may strongly affect the practical possibility to estimate the model parameters in a statistically sound way. A systematic procedure coupling the numerical assessment of structural identifiability with advanced model-based design of experiments formulations is presented in this paper. The objective is to propose a general approach to design experiments in an optimal way, detecting a proper set of experimental settings that ensure the practical identifiability of PK-PD models. Two simulated case studies based on in vitro bacterial growth and killing models are presented to demonstrate the applicability and generality of the methodology to tackle model identifiability issues effectively, through the design of feasible and highly informative experiments.

  14. Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

    PubMed Central

    Ré, Miguel A.; Azad, Rajeev K.

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  15. Generalization of entropy based divergence measures for symbolic sequence analysis.

    PubMed

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

  16. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670

  17. Survival Regression Modeling Strategies in CVD Prediction.

    PubMed

    Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-04-01

    A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.

  18. Predicting oropharyngeal tumor volume throughout the course of radiation therapy from pretreatment computed tomography data using general linear models.

    PubMed

    Yock, Adam D; Rao, Arvind; Dong, Lei; Beadle, Beth M; Garden, Adam S; Kudchadker, Rajat J; Court, Laurence E

    2014-05-01

    The purpose of this work was to develop and evaluate the accuracy of several predictive models of variation in tumor volume throughout the course of radiation therapy. Nineteen patients with oropharyngeal cancers were imaged daily with CT-on-rails for image-guided alignment per an institutional protocol. The daily volumes of 35 tumors in these 19 patients were determined and used to generate (1) a linear model in which tumor volume changed at a constant rate, (2) a general linear model that utilized the power fit relationship between the daily and initial tumor volumes, and (3) a functional general linear model that identified and exploited the primary modes of variation between time series describing the changing tumor volumes. Primary and nodal tumor volumes were examined separately. The accuracy of these models in predicting daily tumor volumes were compared with those of static and linear reference models using leave-one-out cross-validation. In predicting the daily volume of primary tumors, the general linear model and the functional general linear model were more accurate than the static reference model by 9.9% (range: -11.6%-23.8%) and 14.6% (range: -7.3%-27.5%), respectively, and were more accurate than the linear reference model by 14.2% (range: -6.8%-40.3%) and 13.1% (range: -1.5%-52.5%), respectively. In predicting the daily volume of nodal tumors, only the 14.4% (range: -11.1%-20.5%) improvement in accuracy of the functional general linear model compared to the static reference model was statistically significant. A general linear model and a functional general linear model trained on data from a small population of patients can predict the primary tumor volume throughout the course of radiation therapy with greater accuracy than standard reference models. These more accurate models may increase the prognostic value of information about the tumor garnered from pretreatment computed tomography images and facilitate improved treatment management.

  19. Variations on Bayesian Prediction and Inference

    DTIC Science & Technology

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  20. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  1. Multi-level emulation of complex climate model responses to boundary forcing data

    NASA Astrophysics Data System (ADS)

    Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

    2018-04-01

    Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.

  2. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    NASA Astrophysics Data System (ADS)

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  3. Temperature in and out of equilibrium: A review of concepts, tools and attempts

    NASA Astrophysics Data System (ADS)

    Puglisi, A.; Sarracino, A.; Vulpiani, A.

    2017-11-01

    We review the general aspects of the concept of temperature in equilibrium and non-equilibrium statistical mechanics. Although temperature is an old and well-established notion, it still presents controversial facets. After a short historical survey of the key role of temperature in thermodynamics and statistical mechanics, we tackle a series of issues which have been recently reconsidered. In particular, we discuss different definitions and their relevance for energy fluctuations. The interest in such a topic has been triggered by the recent observation of negative temperatures in condensed matter experiments. Moreover, the ability to manipulate systems at the micro and nano-scale urges to understand and clarify some aspects related to the statistical properties of small systems (as the issue of temperature's ;fluctuations;). We also discuss the notion of temperature in a dynamical context, within the theory of linear response for Hamiltonian systems at equilibrium and stochastic models with detailed balance, and the generalized fluctuation-response relations, which provide a hint for an extension of the definition of temperature in far-from-equilibrium systems. To conclude we consider non-Hamiltonian systems, such as granular materials, turbulence and active matter, where a general theoretical framework is still lacking.

  4. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid.

  5. Statistical Selection of Biological Models for Genome-Wide Association Analyses.

    PubMed

    Bi, Wenjian; Kang, Guolian; Pounds, Stanley B

    2018-05-24

    Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.

  6. Impact of cleaning and other interventions on the reduction of hospital-acquired Clostridium difficile infections in two hospitals in England assessed using a breakpoint model.

    PubMed

    Hughes, G J; Nickerson, E; Enoch, D A; Ahluwalia, J; Wilkinson, C; Ayers, R; Brown, N M

    2013-07-01

    Clostridium difficile infection remains a major challenge for hospitals. Although targeted infection control initiatives have been shown to be effective in reducing the incidence of hospital-acquired C. difficile infection, there is little evidence available to assess the effectiveness of specific interventions. To use statistical modelling to detect substantial reductions in the incidence of C. difficile from time series data from two hospitals in England, and relate these time points to infection control interventions. A statistical breakpoints model was fitted to likely hospital-acquired C. difficile infection incidence data from a teaching hospital (2002-2009) and a district general hospital (2005-2009) in England. Models with increasing complexity (i.e. increasing the number of breakpoints) were tested for an improved fit to the data. Partitions estimated from breakpoint models were tested for individual stability using statistical process control charts. Major infection control interventions from both hospitals during this time were grouped according to their primary target (antibiotics, cleaning, isolation, other) and mapped to the model-suggested breakpoints. For both hospitals, breakpoints coincided with enhancements to cleaning protocols. Statistical models enabled formal assessment of the impact of different interventions, and showed that enhancements to deep cleaning programmes are the interventions that have most likely led to substantial reductions in hospital-acquired C. difficile infections at the two hospitals studied. Copyright © 2013 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  7. Assessing the effect of land use change on catchment runoff by combined use of statistical tests and hydrological modelling: Case studies from Zimbabwe

    NASA Astrophysics Data System (ADS)

    Lørup, Jens Kristian; Refsgaard, Jens Christian; Mazvimavi, Dominic

    1998-03-01

    The purpose of this study was to identify and assess long-term impacts of land use change on catchment runoff in semi-arid Zimbabwe, based on analyses of long hydrological time series (25-50 years) from six medium-sized (200-1000 km 2) non-experimental rural catchments. A methodology combining common statistical methods with hydrological modelling was adopted in order to distinguish between the effects of climate variability and the effects of land use change. The hydrological model (NAM) was in general able to simulate the observed hydrographs very well during the reference period, thus providing a means to account for the effects of climate variability and hence strengthening the power of the subsequent statistical tests. In the test period the validated model was used to provide the runoff record which would have occurred in the absence of land use change. The analyses indicated a decrease in the annual runoff for most of the six catchments, with the largest changes occurring for catchments located within communal land, where large increases in population and agricultural intensity have taken place. However, the decrease was only statistically significant at the 5% level for one of the catchments.

  8. Occupation times and ergodicity breaking in biased continuous time random walks

    NASA Astrophysics Data System (ADS)

    Bel, Golan; Barkai, Eli

    2005-12-01

    Continuous time random walk (CTRW) models are widely used to model diffusion in condensed matter. There are two classes of such models, distinguished by the convergence or divergence of the mean waiting time. Systems with finite average sojourn time are ergodic and thus Boltzmann-Gibbs statistics can be applied. We investigate the statistical properties of CTRW models with infinite average sojourn time; in particular, the occupation time probability density function is obtained. It is shown that in the non-ergodic phase the distribution of the occupation time of the particle on a given lattice point exhibits bimodal U or trimodal W shape, related to the arcsine law. The key points are as follows. (a) In a CTRW with finite or infinite mean waiting time, the distribution of the number of visits on a lattice point is determined by the probability that a member of an ensemble of particles in equilibrium occupies the lattice point. (b) The asymmetry parameter of the probability distribution function of occupation times is related to the Boltzmann probability and to the partition function. (c) The ensemble average is given by Boltzmann-Gibbs statistics for either finite or infinite mean sojourn time, when detailed balance conditions hold. (d) A non-ergodic generalization of the Boltzmann-Gibbs statistical mechanics for systems with infinite mean sojourn time is found.

  9. Meta-analysis for the comparison of two diagnostic tests to a common gold standard: A generalized linear mixed model approach.

    PubMed

    Hoyer, Annika; Kuss, Oliver

    2018-05-01

    Meta-analysis of diagnostic studies is still a rapidly developing area of biostatistical research. Especially, there is an increasing interest in methods to compare different diagnostic tests to a common gold standard. Restricting to the case of two diagnostic tests, in these meta-analyses the parameters of interest are the differences of sensitivities and specificities (with their corresponding confidence intervals) between the two diagnostic tests while accounting for the various associations across single studies and between the two tests. We propose statistical models with a quadrivariate response (where sensitivity of test 1, specificity of test 1, sensitivity of test 2, and specificity of test 2 are the four responses) as a sensible approach to this task. Using a quadrivariate generalized linear mixed model naturally generalizes the common standard bivariate model of meta-analysis for a single diagnostic test. If information on several thresholds of the tests is available, the quadrivariate model can be further generalized to yield a comparison of full receiver operating characteristic (ROC) curves. We illustrate our model by an example where two screening methods for the diagnosis of type 2 diabetes are compared.

  10. Use of generalized ordered logistic regression for the analysis of multidrug resistance data.

    PubMed

    Agga, Getahun E; Scott, H Morgan

    2015-10-01

    Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.

  11. a Statistical Dynamic Approach to Structural Evolution of Complex Capital Market Systems

    NASA Astrophysics Data System (ADS)

    Shao, Xiao; Chai, Li H.

    As an important part of modern financial systems, capital market has played a crucial role on diverse social resource allocations and economical exchanges. Beyond traditional models and/or theories based on neoclassical economics, considering capital markets as typical complex open systems, this paper attempts to develop a new approach to overcome some shortcomings of the available researches. By defining the generalized entropy of capital market systems, a theoretical model and nonlinear dynamic equation on the operations of capital market are proposed from statistical dynamic perspectives. The US security market from 1995 to 2001 is then simulated and analyzed as a typical case. Some instructive results are discussed and summarized.

  12. Controlling reactivity of nanoporous catalyst materials by tuning reaction product-pore interior interactions: Statistical mechanical modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jing; Ackerman, David M.; Lin, Victor S.-Y.

    2013-04-02

    Statistical mechanical modeling is performed of a catalytic conversion reaction within a functionalized nanoporous material to assess the effect of varying the reaction product-pore interior interaction from attractive to repulsive. A strong enhancement in reactivity is observed not just due to the shift in reaction equilibrium towards completion but also due to enhanced transport within the pore resulting from reduced loading. The latter effect is strongest for highly restricted transport (single-file diffusion), and applies even for irreversible reactions. The analysis is performed utilizing a generalized hydrodynamic formulation of the reaction-diffusion equations which can reliably capture the complex interplay between reactionmore » and restricted transport.« less

  13. GPU-computing in econophysics and statistical physics

    NASA Astrophysics Data System (ADS)

    Preis, T.

    2011-03-01

    A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.

  14. Statistical thermodynamics of long straight rigid rods on triangular lattices: nematic order and adsorption thermodynamic functions.

    PubMed

    Matoz-Fernandez, D A; Linares, D H; Ramirez-Pastor, A J

    2012-09-04

    The statistical thermodynamics of straight rigid rods of length k on triangular lattices was developed on a generalization in the spirit of the lattice-gas model and the classical Guggenheim-DiMarzio approximation. In this scheme, the Helmholtz free energy and its derivatives were written in terms of the order parameter, δ, which characterizes the nematic phase occurring in the system at intermediate densities. Then, using the principle of minimum free energy with δ as a parameter, the main adsorption properties were calculated. Comparisons with Monte Carlo simulations and experimental data were performed in order to evaluate the outcome and limitations of the theoretical model.

  15. On statistical inference in time series analysis of the evolution of road safety.

    PubMed

    Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora

    2013-11-01

    Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Statistical inference for noisy nonlinear ecological dynamic systems.

    PubMed

    Wood, Simon N

    2010-08-26

    Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.

  17. Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

    NASA Astrophysics Data System (ADS)

    Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

    2017-07-01

    Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.

  18. North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends

    NASA Technical Reports Server (NTRS)

    Grotjahn, Richard; Black, Robert; Leung, Ruby; Wehner, Michael F.; Barlow, Mathew; Bosilovich, Michael G.; Gershunov, Alexander; Gutowski, William J., Jr.; Gyakum, John R.; Katz, Richard W.; hide

    2015-01-01

    The objective of this paper is to review statistical methods, dynamics, modeling efforts, and trends related to temperature extremes, with a focus upon extreme events of short duration that affect parts of North America. These events are associated with large scale meteorological patterns (LSMPs). The statistics, dynamics, and modeling sections of this paper are written to be autonomous and so can be read separately. Methods to define extreme events statistics and to identify and connect LSMPs to extreme temperature events are presented. Recent advances in statistical techniques connect LSMPs to extreme temperatures through appropriately defined covariates that supplement more straightforward analyses. Various LSMPs, ranging from synoptic to planetary scale structures, are associated with extreme temperature events. Current knowledge about the synoptics and the dynamical mechanisms leading to the associated LSMPs is incomplete. Systematic studies of: the physics of LSMP life cycles, comprehensive model assessment of LSMP-extreme temperature event linkages, and LSMP properties are needed. Generally, climate models capture observed properties of heat waves and cold air outbreaks with some fidelity. However they overestimate warm wave frequency and underestimate cold air outbreak frequency, and underestimate the collective influence of low-frequency modes on temperature extremes. Modeling studies have identified the impact of large-scale circulation anomalies and landatmosphere interactions on changes in extreme temperatures. However, few studies have examined changes in LSMPs to more specifically understand the role of LSMPs on past and future extreme temperature changes. Even though LSMPs are resolvable by global and regional climate models, they are not necessarily well simulated. The paper concludes with unresolved issues and research questions.

  19. Weather extremes in very large, high-resolution ensembles: the weatherathome experiment

    NASA Astrophysics Data System (ADS)

    Allen, M. R.; Rosier, S.; Massey, N.; Rye, C.; Bowery, A.; Miller, J.; Otto, F.; Jones, R.; Wilson, S.; Mote, P.; Stone, D. A.; Yamazaki, Y. H.; Carrington, D.

    2011-12-01

    Resolution and ensemble size are often seen as alternatives in climate modelling. Models with sufficient resolution to simulate many classes of extreme weather cannot normally be run often enough to assess the statistics of rare events, still less how these statistics may be changing. As a result, assessments of the impact of external forcing on regional climate extremes must be based either on statistical downscaling from relatively coarse-resolution models, or statistical extrapolation from 10-year to 100-year events. Under the weatherathome experiment, part of the climateprediction.net initiative, we have compiled the Met Office Regional Climate Model HadRM3P to run on personal computer volunteered by the general public at 25 and 50km resolution, embedded within the HadAM3P global atmosphere model. With a global network of about 50,000 volunteers, this allows us to run time-slice ensembles of essentially unlimited size, exploring the statistics of extreme weather under a range of scenarios for surface forcing and atmospheric composition, allowing for uncertainty in both boundary conditions and model parameters. Current experiments, developed with the support of Microsoft Research, focus on three regions, the Western USA, Europe and Southern Africa. We initially simulate the period 1959-2010 to establish which variables are realistically simulated by the model and on what scales. Our next experiments are focussing on the Event Attribution problem, exploring how the probability of various types of extreme weather would have been different over the recent past in a world unaffected by human influence, following the design of Pall et al (2011), but extended to a longer period and higher spatial resolution. We will present the first results of the unique, global, participatory experiment and discuss the implications for the attribution of recent weather events to anthropogenic influence on climate.

  20. Implication of Tsallis entropy in the Thomas–Fermi model for self-gravitating fermions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ourabah, Kamel; Tribeche, Mouloud, E-mail: mouloudtribeche@yahoo.fr

    The Thomas–Fermi approach for self-gravitating fermions is revisited within the theoretical framework of the q-statistics. Starting from the q-deformation of the Fermi–Dirac distribution function, a generalized Thomas–Fermi equation is derived. It is shown that the Tsallis entropy preserves a scaling property of this equation. The q-statistical approach to Jeans’ instability in a system of self-gravitating fermions is also addressed. The dependence of the Jeans’ wavenumber (or the Jeans length) on the parameter q is traced. It is found that the q-statistics makes the Fermionic system unstable at scales shorter than the standard Jeans length. -- Highlights: •Thomas–Fermi approach for self-gravitatingmore » fermions. •A generalized Thomas–Fermi equation is derived. •Nonextensivity preserves a scaling property of this equation. •Nonextensive approach to Jeans’ instability of self-gravitating fermions. •It is found that nonextensivity makes the Fermionic system unstable at shorter scales.« less

Top