Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models
2015-09-12
AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA9550-11-1-0239 5c. PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY
NASA Astrophysics Data System (ADS)
Stan Development Team
2018-01-01
Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
Physics-based statistical model and simulation method of RF propagation in urban environments
Pao, Hsueh-Yuan; Dvorak, Steven L.
2010-09-14
A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
Rainfall runoff modelling of the Upper Ganga and Brahmaputra basins using PERSiST.
Futter, M N; Whitehead, P G; Sarkar, S; Rodda, H; Crossman, J
2015-06-01
There are ongoing discussions about the appropriate level of complexity and sources of uncertainty in rainfall runoff models. Simulations for operational hydrology, flood forecasting or nutrient transport all warrant different levels of complexity in the modelling approach. More complex model structures are appropriate for simulations of land-cover dependent nutrient transport while more parsimonious model structures may be adequate for runoff simulation. The appropriate level of complexity is also dependent on data availability. Here, we use PERSiST; a simple, semi-distributed dynamic rainfall-runoff modelling toolkit to simulate flows in the Upper Ganges and Brahmaputra rivers. We present two sets of simulations driven by single time series of daily precipitation and temperature using simple (A) and complex (B) model structures based on uniform and hydrochemically relevant land covers respectively. Models were compared based on ensembles of Bayesian Information Criterion (BIC) statistics. Equifinality was observed for parameters but not for model structures. Model performance was better for the more complex (B) structural representations than for parsimonious model structures. The results show that structural uncertainty is more important than parameter uncertainty. The ensembles of BIC statistics suggested that neither structural representation was preferable in a statistical sense. Simulations presented here confirm that relatively simple models with limited data requirements can be used to credibly simulate flows and water balance components needed for nutrient flux modelling in large, data-poor basins.
Rivas, Elena; Lang, Raymond; Eddy, Sean R
2012-02-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
Rivas, Elena; Lang, Raymond; Eddy, Sean R.
2012-01-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308
Probabilistic Graphical Model Representation in Phylogenetics
Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.
2014-01-01
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
On the Way to Appropriate Model Complexity
NASA Astrophysics Data System (ADS)
Höge, M.
2016-12-01
When statistical models are used to represent natural phenomena they are often too simple or too complex - this is known. But what exactly is model complexity? Among many other definitions, the complexity of a model can be conceptualized as a measure of statistical dependence between observations and parameters (Van der Linde, 2014). However, several issues remain when working with model complexity: A unique definition for model complexity is missing. Assuming a definition is accepted, how can model complexity be quantified? How can we use a quantified complexity to the better of modeling? Generally defined, "complexity is a measure of the information needed to specify the relationships between the elements of organized systems" (Bawden & Robinson, 2015). The complexity of a system changes as the knowledge about the system changes. For models this means that complexity is not a static concept: With more data or higher spatio-temporal resolution of parameters, the complexity of a model changes. There are essentially three categories into which all commonly used complexity measures can be classified: (1) An explicit representation of model complexity as "Degrees of freedom" of a model, e.g. effective number of parameters. (2) Model complexity as code length, a.k.a. "Kolmogorov complexity": The longer the shortest model code, the higher its complexity (e.g. in bits). (3) Complexity defined via information entropy of parametric or predictive uncertainty. Preliminary results show that Bayes theorem allows for incorporating all parts of the non-static concept of model complexity like data quality and quantity or parametric uncertainty. Therefore, we test how different approaches for measuring model complexity perform in comparison to a fully Bayesian model selection procedure. Ultimately, we want to find a measure that helps to assess the most appropriate model.
Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems.
Williams, Richard A; Timmis, Jon; Qwarnstrom, Eva E
2016-01-01
Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model.
Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems
Timmis, Jon; Qwarnstrom, Eva E.
2016-01-01
Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model. PMID:27571414
Statistical Analysis of Complexity Generators for Cost Estimation
NASA Technical Reports Server (NTRS)
Rowell, Ginger Holmes
1999-01-01
Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.
Teaching Statistics--Despite Its Applications
ERIC Educational Resources Information Center
Ridgway, Jim; Nicholson, James; McCusker, Sean
2007-01-01
Evidence-based policy requires sophisticated modelling and reasoning about complex social data. The current UK statistics curricula do not equip tomorrow's citizens to understand such reasoning. We advocate radical curriculum reform, designed to require students to reason from complex data.
Central Limit Theorem for Exponentially Quasi-local Statistics of Spin Models on Cayley Graphs
NASA Astrophysics Data System (ADS)
Reddy, Tulasi Ram; Vadlamani, Sreekar; Yogeshwaran, D.
2018-04-01
Central limit theorems for linear statistics of lattice random fields (including spin models) are usually proven under suitable mixing conditions or quasi-associativity. Many interesting examples of spin models do not satisfy mixing conditions, and on the other hand, it does not seem easy to show central limit theorem for local statistics via quasi-associativity. In this work, we prove general central limit theorems for local statistics and exponentially quasi-local statistics of spin models on discrete Cayley graphs with polynomial growth. Further, we supplement these results by proving similar central limit theorems for random fields on discrete Cayley graphs taking values in a countable space, but under the stronger assumptions of α -mixing (for local statistics) and exponential α -mixing (for exponentially quasi-local statistics). All our central limit theorems assume a suitable variance lower bound like many others in the literature. We illustrate our general central limit theorem with specific examples of lattice spin models and statistics arising in computational topology, statistical physics and random networks. Examples of clustering spin models include quasi-associated spin models with fast decaying covariances like the off-critical Ising model, level sets of Gaussian random fields with fast decaying covariances like the massive Gaussian free field and determinantal point processes with fast decaying kernels. Examples of local statistics include intrinsic volumes, face counts, component counts of random cubical complexes while exponentially quasi-local statistics include nearest neighbour distances in spin models and Betti numbers of sub-critical random cubical complexes.
Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C
2015-01-01
Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175
NASA Astrophysics Data System (ADS)
Qi, Di
Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.
Quantifying networks complexity from information geometry viewpoint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Felice, Domenico, E-mail: domenico.felice@unicam.it; Mancini, Stefano; INFN-Sezione di Perugia, Via A. Pascoli, I-06123 Perugia
We consider a Gaussian statistical model whose parameter space is given by the variances of random variables. Underlying this model we identify networks by interpreting random variables as sitting on vertices and their correlations as weighted edges among vertices. We then associate to the parameter space a statistical manifold endowed with a Riemannian metric structure (that of Fisher-Rao). Going on, in analogy with the microcanonical definition of entropy in Statistical Mechanics, we introduce an entropic measure of networks complexity. We prove that it is invariant under networks isomorphism. Above all, considering networks as simplicial complexes, we evaluate this entropy onmore » simplexes and find that it monotonically increases with their dimension.« less
Ionospheric scintillation studies
NASA Technical Reports Server (NTRS)
Rino, C. L.; Freemouw, E. J.
1973-01-01
The diffracted field of a monochromatic plane wave was characterized by two complex correlation functions. For a Gaussian complex field, these quantities suffice to completely define the statistics of the field. Thus, one can in principle calculate the statistics of any measurable quantity in terms of the model parameters. The best data fits were achieved for intensity statistics derived under the Gaussian statistics hypothesis. The signal structure that achieved the best fit was nearly invariant with scintillation level and irregularity source (ionosphere or solar wind). It was characterized by the fact that more than 80% of the scattered signal power is in phase quadrature with the undeviated or coherent signal component. Thus, the Gaussian-statistics hypothesis is both convenient and accurate for channel modeling work.
Analysis of Immune Complex Structure by Statistical Mechanics and Light Scattering Techniques.
NASA Astrophysics Data System (ADS)
Busch, Nathan Adams
1995-01-01
The size and structure of immune complexes determine their behavior in the immune system. The chemical physics of the complex formation is not well understood; this is due in part to inadequate characterization of the proteins involved, and in part by lack of sufficiently well developed theoretical techniques. Understanding the complex formation will permit rational design of strategies for inhibiting tissue deposition of the complexes. A statistical mechanical model of the proteins based upon the theory of associating fluids was developed. The multipole electrostatic potential for each protein used in this study was characterized for net protein charge, dipole moment magnitude, and dipole moment direction. The binding sites, between the model antigen and antibodies, were characterized for their net surface area, energy, and position relative to the dipole moment of the protein. The equilibrium binding graphs generated with the protein statistical mechanical model compares favorably with experimental data obtained from radioimmunoassay results. The isothermal compressibility predicted by the model agrees with results obtained from dynamic light scattering. The statistical mechanics model was used to investigate association between the model antigen and selected pairs of antibodies. It was found that, in accordance to expectations from thermodynamic arguments, the highest total binding energy yielded complex distributions which were skewed to higher complex size. From examination of the simulated formation of ring structures from linear chain complexes, and from the joint shape probability surfaces, it was found that ring configurations were formed by the "folding" of linear chains until the ends are within binding distance. By comparing the single antigen/two antibody system which differ only in their respective binding site locations, it was found that binding site location influences complex size and shape distributions only when ring formation occurs. The internal potential energy of a ring complex is considerably less than that of the non-associating system; therefore the ring complexes are quite stable and show no evidence of breaking, and collapsing into smaller complexes. The ring formation will occur only in systems where the total free energy of each complex may be minimized. Thus, ring formation will occur even though entropically unfavorable conformations result if the total free energy can be minimized by doing so.
NASA Astrophysics Data System (ADS)
Golmohammadi, A.; Jafarpour, B.; M Khaninezhad, M. R.
2017-12-01
Calibration of heterogeneous subsurface flow models leads to ill-posed nonlinear inverse problems, where too many unknown parameters are estimated from limited response measurements. When the underlying parameters form complex (non-Gaussian) structured spatial connectivity patterns, classical variogram-based geostatistical techniques cannot describe the underlying connectivity patterns. Modern pattern-based geostatistical methods that incorporate higher-order spatial statistics are more suitable for describing such complex spatial patterns. Moreover, when the underlying unknown parameters are discrete (geologic facies distribution), conventional model calibration techniques that are designed for continuous parameters cannot be applied directly. In this paper, we introduce a novel pattern-based model calibration method to reconstruct discrete and spatially complex facies distributions from dynamic flow response data. To reproduce complex connectivity patterns during model calibration, we impose a feasibility constraint to ensure that the solution follows the expected higher-order spatial statistics. For model calibration, we adopt a regularized least-squares formulation, involving data mismatch, pattern connectivity, and feasibility constraint terms. Using an alternating directions optimization algorithm, the regularized objective function is divided into a continuous model calibration problem, followed by mapping the solution onto the feasible set. The feasibility constraint to honor the expected spatial statistics is implemented using a supervised machine learning algorithm. The two steps of the model calibration formulation are repeated until the convergence criterion is met. Several numerical examples are used to evaluate the performance of the developed method.
NASA Astrophysics Data System (ADS)
Kassem, M.; Soize, C.; Gagliardini, L.
2009-06-01
In this paper, an energy-density field approach applied to the vibroacoustic analysis of complex industrial structures in the low- and medium-frequency ranges is presented. This approach uses a statistical computational model. The analyzed system consists of an automotive vehicle structure coupled with its internal acoustic cavity. The objective of this paper is to make use of the statistical properties of the frequency response functions of the vibroacoustic system observed from previous experimental and numerical work. The frequency response functions are expressed in terms of a dimensionless matrix which is estimated using the proposed energy approach. Using this dimensionless matrix, a simplified vibroacoustic model is proposed.
Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes
Katahira, Kentaro; Suzuki, Kenta; Okanoya, Kazuo; Okada, Masato
2011-01-01
Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors such as human speech and musical performance, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical properties of the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable labeles, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model; GMM), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex behavioral sequences with higher-order dependencies. PMID:21915345
Detecting changes in dynamic and complex acoustic environments
Boubenec, Yves; Lawlor, Jennifer; Górska, Urszula; Shamma, Shihab; Englitz, Bernhard
2017-01-01
Natural sounds such as wind or rain, are characterized by the statistical occurrence of their constituents. Despite their complexity, listeners readily detect changes in these contexts. We here address the neural basis of statistical decision-making using a combination of psychophysics, EEG and modelling. In a texture-based, change-detection paradigm, human performance and reaction times improved with longer pre-change exposure, consistent with improved estimation of baseline statistics. Change-locked and decision-related EEG responses were found in a centro-parietal scalp location, whose slope depended on change size, consistent with sensory evidence accumulation. The potential's amplitude scaled with the duration of pre-change exposure, suggesting a time-dependent decision threshold. Auditory cortex-related potentials showed no response to the change. A dual timescale, statistical estimation model accounted for subjects' performance. Furthermore, a decision-augmented auditory cortex model accounted for performance and reaction times, suggesting that the primary cortical representation requires little post-processing to enable change-detection in complex acoustic environments. DOI: http://dx.doi.org/10.7554/eLife.24910.001 PMID:28262095
Right-Sizing Statistical Models for Longitudinal Data
Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.
2015-01-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱
Marcum, Christopher Steven; Butts, Carter T.
2015-01-01
The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Modelling Complexity: Making Sense of Leadership Issues in 14-19 Education
ERIC Educational Resources Information Center
Briggs, Ann R. J.
2008-01-01
Modelling of statistical data is a well established analytical strategy. Statistical data can be modelled to represent, and thereby predict, the forces acting upon a structure or system. For the rapidly changing systems in the world of education, modelling enables the researcher to understand, to predict and to enable decisions to be based upon…
Heterogeneous variances in multi-environment yield trials for corn hybrids
USDA-ARS?s Scientific Manuscript database
Recent developments in statistics and computing have enabled much greater levels of complexity in statistical models of multi-environment yield trial data. One particular feature of interest to breeders is simultaneously modeling heterogeneity of variances among environments and cultivars. Our obj...
Evolving Scale-Free Networks by Poisson Process: Modeling and Degree Distribution.
Feng, Minyu; Qu, Hong; Yi, Zhang; Xie, Xiurui; Kurths, Jurgen
2016-05-01
Since the great mathematician Leonhard Euler initiated the study of graph theory, the network has been one of the most significant research subject in multidisciplinary. In recent years, the proposition of the small-world and scale-free properties of complex networks in statistical physics made the network science intriguing again for many researchers. One of the challenges of the network science is to propose rational models for complex networks. In this paper, in order to reveal the influence of the vertex generating mechanism of complex networks, we propose three novel models based on the homogeneous Poisson, nonhomogeneous Poisson and birth death process, respectively, which can be regarded as typical scale-free networks and utilized to simulate practical networks. The degree distribution and exponent are analyzed and explained in mathematics by different approaches. In the simulation, we display the modeling process, the degree distribution of empirical data by statistical methods, and reliability of proposed networks, results show our models follow the features of typical complex networks. Finally, some future challenges for complex systems are discussed.
Routine Discovery of Complex Genetic Models using Genetic Algorithms
Moore, Jason H.; Hahn, Lance W.; Ritchie, Marylyn D.; Thornton, Tricia A.; White, Bill C.
2010-01-01
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes. PMID:20948983
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
NASA Astrophysics Data System (ADS)
Nearing, G. S.
2014-12-01
Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.
Reconciling statistical and systems science approaches to public health.
Ip, Edward H; Rahmandad, Hazhir; Shoham, David A; Hammond, Ross; Huang, Terry T-K; Wang, Youfa; Mabry, Patricia L
2013-10-01
Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision.
Reconciling Statistical and Systems Science Approaches to Public Health
Ip, Edward H.; Rahmandad, Hazhir; Shoham, David A.; Hammond, Ross; Huang, Terry T.-K.; Wang, Youfa; Mabry, Patricia L.
2016-01-01
Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision. PMID:24084395
Reversibility in Quantum Models of Stochastic Processes
NASA Astrophysics Data System (ADS)
Gier, David; Crutchfield, James; Mahoney, John; James, Ryan
Natural phenomena such as time series of neural firing, orientation of layers in crystal stacking and successive measurements in spin-systems are inherently probabilistic. The provably minimal classical models of such stochastic processes are ɛ-machines, which consist of internal states, transition probabilities between states and output values. The topological properties of the ɛ-machine for a given process characterize the structure, memory and patterns of that process. However ɛ-machines are often not ideal because their statistical complexity (Cμ) is demonstrably greater than the excess entropy (E) of the processes they represent. Quantum models (q-machines) of the same processes can do better in that their statistical complexity (Cq) obeys the relation Cμ >= Cq >= E. q-machines can be constructed to consider longer lengths of strings, resulting in greater compression. With code-words of sufficiently long length, the statistical complexity becomes time-symmetric - a feature apparently novel to this quantum representation. This result has ramifications for compression of classical information in quantum computing and quantum communication technology.
Multi-level emulation of complex climate model responses to boundary forcing data
NASA Astrophysics Data System (ADS)
Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter
2018-04-01
Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.
Online Statistical Modeling (Regression Analysis) for Independent Responses
NASA Astrophysics Data System (ADS)
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Effective control of complex turbulent dynamical systems through statistical functionals.
Majda, Andrew J; Qi, Di
2017-05-30
Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among complex systems in science and engineering, including climate, material, and neural science. Control of these complex systems is a grand challenge, for example, in mitigating the effects of climate change or safe design of technology with fully developed shear turbulence. Control of flows in the transition to turbulence, where there is a small dimension of instabilities about a basic mean state, is an important and successful discipline. In complex turbulent dynamical systems, it is impossible to track and control the large dimension of instabilities, which strongly interact and exchange energy, and new control strategies are needed. The goal of this paper is to propose an effective statistical control strategy for complex turbulent dynamical systems based on a recent statistical energy principle and statistical linear response theory. We illustrate the potential practical efficiency and verify this effective statistical control strategy on the 40D Lorenz 1996 model in forcing regimes with various types of fully turbulent dynamics with nearly one-half of the phase space unstable.
Weck, P J; Schaffner, D A; Brown, M R; Wicks, R T
2015-02-01
The Bandt-Pompe permutation entropy and the Jensen-Shannon statistical complexity are used to analyze fluctuating time series of three different turbulent plasmas: the magnetohydrodynamic (MHD) turbulence in the plasma wind tunnel of the Swarthmore Spheromak Experiment (SSX), drift-wave turbulence of ion saturation current fluctuations in the edge of the Large Plasma Device (LAPD), and fully developed turbulent magnetic fluctuations of the solar wind taken from the Wind spacecraft. The entropy and complexity values are presented as coordinates on the CH plane for comparison among the different plasma environments and other fluctuation models. The solar wind is found to have the highest permutation entropy and lowest statistical complexity of the three data sets analyzed. Both laboratory data sets have larger values of statistical complexity, suggesting that these systems have fewer degrees of freedom in their fluctuations, with SSX magnetic fluctuations having slightly less complexity than the LAPD edge I(sat). The CH plane coordinates are compared to the shape and distribution of a spectral decomposition of the wave forms. These results suggest that fully developed turbulence (solar wind) occupies the lower-right region of the CH plane, and that other plasma systems considered to be turbulent have less permutation entropy and more statistical complexity. This paper presents use of this statistical analysis tool on solar wind plasma, as well as on an MHD turbulent experimental plasma.
Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling
NASA Technical Reports Server (NTRS)
Mog, Robert A.
1997-01-01
Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.
Acceleration techniques for dependability simulation. M.S. Thesis
NASA Technical Reports Server (NTRS)
Barnette, James David
1995-01-01
As computer systems increase in complexity, the need to project system performance from the earliest design and development stages increases. We have to employ simulation for detailed dependability studies of large systems. However, as the complexity of the simulation model increases, the time required to obtain statistically significant results also increases. This paper discusses an approach that is application independent and can be readily applied to any process-based simulation model. Topics include background on classical discrete event simulation and techniques for random variate generation and statistics gathering to support simulation.
Statistical significance of the rich-club phenomenon in complex networks
NASA Astrophysics Data System (ADS)
Jiang, Zhi-Qiang; Zhou, Wei-Xing
2008-04-01
We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.
STUDY OF TURBULENT ENERGY OVER COMPLEX TERRAIN: STATE, 1978
The complex structure of the earth's surface influenced atmospheric parameters pertinent to modeling the diffusion process during the 1978 'STATE' field study. The Information Theory approach of statistics proved useful for analyzing the complex structures observed in the radiome...
Statistical dielectronic recombination rates for multielectron ions in plasma
NASA Astrophysics Data System (ADS)
Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.
2017-10-01
We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.
Action detection by double hierarchical multi-structure space-time statistical matching model
NASA Astrophysics Data System (ADS)
Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang
2018-03-01
Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.
Action detection by double hierarchical multi-structure space–time statistical matching model
NASA Astrophysics Data System (ADS)
Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang
2018-06-01
Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.
Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity
NASA Astrophysics Data System (ADS)
Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.
As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.
Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu
2014-09-01
Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of using multi-path OP-FTIR measurement. The wind data incorporating with the statistical modeling (PCA) may successfully identify the major emission source in a complex industrial district.
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Zhang, Qin; Yao, Quanying
2018-05-01
The dynamic uncertain causality graph (DUCG) is a newly presented framework for uncertain causality representation and probabilistic reasoning. It has been successfully applied to online fault diagnoses of large, complex industrial systems, and decease diagnoses. This paper extends the DUCG to model more complex cases than what could be previously modeled, e.g., the case in which statistical data are in different groups with or without overlap, and some domain knowledge and actions (new variables with uncertain causalities) are introduced. In other words, this paper proposes to use -mode, -mode, and -mode of the DUCG to model such complex cases and then transform them into either the standard -mode or the standard -mode. In the former situation, if no directed cyclic graph is involved, the transformed result is simply a Bayesian network (BN), and existing inference methods for BNs can be applied. In the latter situation, an inference method based on the DUCG is proposed. Examples are provided to illustrate the methodology.
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Multi-region statistical shape model for cochlear implantation
NASA Astrophysics Data System (ADS)
Romera, Jordi; Kjer, H. Martin; Piella, Gemma; Ceresa, Mario; González Ballester, Miguel A.
2016-03-01
Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achieve the best results. The complexity of a proper global model increases even more when the amount of data available is limited to a small number of datasets. Typically, the anatomical variability between structures is associated to the variability of their physiological regions. In this paper, a complete pipeline is proposed for building a multi-region statistical shape model to study the entire variability from locally identified physiological regions of the inner ear. The proposed model, which is based on an extension of the Point Distribution Model (PDM), is built for a training set of 17 high-resolution images (24.5 μm voxels) of the inner ear. The model is evaluated according to its generalization ability and specificity. The results are compared with the ones of a global model built directly using the standard PDM approach. The evaluation results suggest that better accuracy can be achieved using a regional modeling of the inner ear.
Trends in modeling Biomedical Complex Systems
Milanesi, Luciano; Romano, Paolo; Castellani, Gastone; Remondini, Daniel; Liò, Petro
2009-01-01
In this paper we provide an introduction to the techniques for multi-scale complex biological systems, from the single bio-molecule to the cell, combining theoretical modeling, experiments, informatics tools and technologies suitable for biological and biomedical research, which are becoming increasingly multidisciplinary, multidimensional and information-driven. The most important concepts on mathematical modeling methodologies and statistical inference, bioinformatics and standards tools to investigate complex biomedical systems are discussed and the prominent literature useful to both the practitioner and the theoretician are presented. PMID:19828068
ERIC Educational Resources Information Center
Frees, Edward W.; Kim, Jee-Seon
2006-01-01
Multilevel models are proven tools in social research for modeling complex, hierarchical systems. In multilevel modeling, statistical inference is based largely on quantification of random variables. This paper distinguishes among three types of random variables in multilevel modeling--model disturbances, random coefficients, and future response…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.
This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less
A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pasqualini, Donatella
This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimatedmore » stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.« less
Wave Propagation in Non-Stationary Statistical Mantle Models at the Global Scale
NASA Astrophysics Data System (ADS)
Meschede, M.; Romanowicz, B. A.
2014-12-01
We study the effect of statistically distributed heterogeneities that are smaller than the resolution of current tomographic models on seismic waves that propagate through the Earth's mantle at teleseismic distances. Current global tomographic models are missing small-scale structure as evidenced by the failure of even accurate numerical synthetics to explain enhanced coda in observed body and surface waveforms. One way to characterize small scale heterogeneity is to construct random models and confront observed coda waveforms with predictions from these models. Statistical studies of the coda typically rely on models with simplified isotropic and stationary correlation functions in Cartesian geometries. We show how to construct more complex random models for the mantle that can account for arbitrary non-stationary and anisotropic correlation functions as well as for complex geometries. Although this method is computationally heavy, model characteristics such as translational, cylindrical or spherical symmetries can be used to greatly reduce the complexity such that this method becomes practical. With this approach, we can create 3D models of the full spherical Earth that can be radially anisotropic, i.e. with different horizontal and radial correlation functions, and radially non-stationary, i.e. with radially varying model power and correlation functions. Both of these features are crucial for a statistical description of the mantle in which structure depends to first order on the spherical geometry of the Earth. We combine different random model realizations of S velocity with current global tomographic models that are robust at long wavelengths (e.g. Meschede and Romanowicz, 2014, GJI submitted), and compute the effects of these hybrid models on the wavefield with a spectral element code (SPECFEM3D_GLOBE). We finally analyze the resulting coda waves for our model selection and compare our computations with observations. Based on these observations, we make predictions about the strength of unresolved small-scale structure and extrinsic attenuation.
Subband Image Coding with Jointly Optimized Quantizers
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith Mark J. T.
1995-01-01
An iterative design algorithm for the joint design of complexity- and entropy-constrained subband quantizers and associated entropy coders is proposed. Unlike conventional subband design algorithms, the proposed algorithm does not require the use of various bit allocation algorithms. Multistage residual quantizers are employed here because they provide greater control of the complexity-performance tradeoffs, and also because they allow efficient and effective high-order statistical modeling. The resulting subband coder exploits statistical dependencies within subbands, across subbands, and across stages, mainly through complexity-constrained high-order entropy coding. Experimental results demonstrate that the complexity-rate-distortion performance of the new subband coder is exceptional.
Le Meur, Nolwenn; Gentleman, Robert
2008-01-01
Background Synthetic lethality defines a genetic interaction where the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. A challenge is to assess genome-wide experimental data and integrate the results to better understand the underlying biological processes. We propose statistical and computational tools that can be used to find relationships between synthetic lethality and cellular organizational units. Results In Saccharomyces cerevisiae, we identified multi-protein complexes and pairs of multi-protein complexes that share an unusually high number of synthetic genetic interactions. As previously predicted, we found that synthetic lethality can arise from subunits of an essential multi-protein complex or between pairs of multi-protein complexes. Finally, using multi-protein complexes allowed us to take into account the pleiotropic nature of the gene products. Conclusions Modeling synthetic lethality using current estimates of the yeast interactome is an efficient approach to disentangle some of the complex molecular interactions that drive a cell. Our model in conjunction with applied statistical methods and computational methods provides new tools to better characterize synthetic genetic interactions. PMID:18789146
Physics-based statistical learning approach to mesoscopic model selection.
Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab
2015-11-01
In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
NASA Astrophysics Data System (ADS)
Kumar, Jagadish; Ananthakrishna, G.
2018-01-01
Scale-invariant power-law distributions for acoustic emission signals are ubiquitous in several plastically deforming materials. However, power-law distributions for acoustic emission energies are reported in distinctly different plastically deforming situations such as hcp and fcc single and polycrystalline samples exhibiting smooth stress-strain curves and in dilute metallic alloys exhibiting discontinuous flow. This is surprising since the underlying dislocation mechanisms in these two types of deformations are very different. So far, there have been no models that predict the power-law statistics for discontinuous flow. Furthermore, the statistics of the acoustic emission signals in jerky flow is even more complex, requiring multifractal measures for a proper characterization. There has been no model that explains the complex statistics either. Here we address the problem of statistical characterization of the acoustic emission signals associated with the three types of the Portevin-Le Chatelier bands. Following our recently proposed general framework for calculating acoustic emission, we set up a wave equation for the elastic degrees of freedom with a plastic strain rate as a source term. The energy dissipated during acoustic emission is represented by the Rayleigh-dissipation function. Using the plastic strain rate obtained from the Ananthakrishna model for the Portevin-Le Chatelier effect, we compute the acoustic emission signals associated with the three Portevin-Le Chatelier bands and the Lüders-like band. The so-calculated acoustic emission signals are used for further statistical characterization. Our results show that the model predicts power-law statistics for all the acoustic emission signals associated with the three types of Portevin-Le Chatelier bands with the exponent values increasing with increasing strain rate. The calculated multifractal spectra corresponding to the acoustic emission signals associated with the three band types have a maximum spread for the type C bands and decreasing with types B and A. We further show that the acoustic emission signals associated with Lüders-like band also exhibit a power-law distribution and multifractality.
Kumar, Jagadish; Ananthakrishna, G
2018-01-01
Scale-invariant power-law distributions for acoustic emission signals are ubiquitous in several plastically deforming materials. However, power-law distributions for acoustic emission energies are reported in distinctly different plastically deforming situations such as hcp and fcc single and polycrystalline samples exhibiting smooth stress-strain curves and in dilute metallic alloys exhibiting discontinuous flow. This is surprising since the underlying dislocation mechanisms in these two types of deformations are very different. So far, there have been no models that predict the power-law statistics for discontinuous flow. Furthermore, the statistics of the acoustic emission signals in jerky flow is even more complex, requiring multifractal measures for a proper characterization. There has been no model that explains the complex statistics either. Here we address the problem of statistical characterization of the acoustic emission signals associated with the three types of the Portevin-Le Chatelier bands. Following our recently proposed general framework for calculating acoustic emission, we set up a wave equation for the elastic degrees of freedom with a plastic strain rate as a source term. The energy dissipated during acoustic emission is represented by the Rayleigh-dissipation function. Using the plastic strain rate obtained from the Ananthakrishna model for the Portevin-Le Chatelier effect, we compute the acoustic emission signals associated with the three Portevin-Le Chatelier bands and the Lüders-like band. The so-calculated acoustic emission signals are used for further statistical characterization. Our results show that the model predicts power-law statistics for all the acoustic emission signals associated with the three types of Portevin-Le Chatelier bands with the exponent values increasing with increasing strain rate. The calculated multifractal spectra corresponding to the acoustic emission signals associated with the three band types have a maximum spread for the type C bands and decreasing with types B and A. We further show that the acoustic emission signals associated with Lüders-like band also exhibit a power-law distribution and multifractality.
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Different Manhattan project: automatic statistical model generation
NASA Astrophysics Data System (ADS)
Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore
2002-03-01
We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R
The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
Sanjak, Jaleal S.; Long, Anthony D.; Thornton, Kevin R.
2017-01-01
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. PMID:28103232
Interference in the classical probabilistic model and its representation in complex Hilbert space
NASA Astrophysics Data System (ADS)
Khrennikov, Andrei Yu.
2005-10-01
The notion of a context (complex of physical conditions, that is to say: specification of the measurement setup) is basic in this paper.We show that the main structures of quantum theory (interference of probabilities, Born's rule, complex probabilistic amplitudes, Hilbert state space, representation of observables by operators) are present already in a latent form in the classical Kolmogorov probability model. However, this model should be considered as a calculus of contextual probabilities. In our approach it is forbidden to consider abstract context independent probabilities: “first context and only then probability”. We construct the representation of the general contextual probabilistic dynamics in the complex Hilbert space. Thus dynamics of the wave function (in particular, Schrödinger's dynamics) can be considered as Hilbert space projections of a realistic dynamics in a “prespace”. The basic condition for representing of the prespace-dynamics is the law of statistical conservation of energy-conservation of probabilities. In general the Hilbert space projection of the “prespace” dynamics can be nonlinear and even irreversible (but it is always unitary). Methods developed in this paper can be applied not only to quantum mechanics, but also to classical statistical mechanics. The main quantum-like structures (e.g., interference of probabilities) might be found in some models of classical statistical mechanics. Quantum-like probabilistic behavior can be demonstrated by biological systems. In particular, it was recently found in some psychological experiments.
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Use of statistical and neural net approaches in predicting toxicity of chemicals.
Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D
2000-01-01
Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.
Inverse modeling with RZWQM2 to predict water quality
USDA-ARS?s Scientific Manuscript database
Agricultural systems models such as RZWQM2 are complex and have numerous parameters that are unknown and difficult to estimate. Inverse modeling provides an objective statistical basis for calibration that involves simultaneous adjustment of model parameters and yields parameter confidence intervals...
Low-complexity stochastic modeling of wall-bounded shear flows
NASA Astrophysics Data System (ADS)
Zare, Armin
Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems
NASA Technical Reports Server (NTRS)
He, Yuning; Davies, Misty Dawn
2014-01-01
The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.
The statistical geometry of transcriptome divergence in cell-type evolution and cancer.
Liang, Cong; Forrest, Alistair R R; Wagner, Günter P
2015-01-14
In evolution, body plan complexity increases due to an increase in the number of individualized cell types. Yet, there is very little understanding of the mechanisms that produce this form of organismal complexity. One model for the origin of novel cell types is the sister cell-type model. According to this model, each cell type arises together with a sister cell type through specialization from an ancestral cell type. A key prediction of the sister cell-type model is that gene expression profiles of cell types exhibit tree structure. Here we present a statistical model for detecting tree structure in transcriptomic data and apply it to transcriptomes from ENCODE and FANTOM5. We show that transcriptomes of normal cells harbour substantial amounts of hierarchical structure. In contrast, cancer cell lines have less tree structure, suggesting that the emergence of cancer cells follows different principles from that of evolutionary cell-type origination.
Baars, Erik W; van der Hart, Onno; Nijenhuis, Ellert R S; Chu, James A; Glas, Gerrit; Draijer, Nel
2011-01-01
The purpose of this study was to develop an expertise-based prognostic model for the treatment of complex posttraumatic stress disorder (PTSD) and dissociative identity disorder (DID). We developed a survey in 2 rounds: In the first round we surveyed 42 experienced therapists (22 DID and 20 complex PTSD therapists), and in the second round we surveyed a subset of 22 of the 42 therapists (13 DID and 9 complex PTSD therapists). First, we drew on therapists' knowledge of prognostic factors for stabilization-oriented treatment of complex PTSD and DID. Second, therapists prioritized a list of prognostic factors by estimating the size of each variable's prognostic effect; we clustered these factors according to content and named the clusters. Next, concept mapping methodology and statistical analyses (including principal components analyses) were used to transform individual judgments into weighted group judgments for clusters of items. A prognostic model, based on consensually determined estimates of effect sizes, of 8 clusters containing 51 factors for both complex PTSD and DID was formed. It includes the clusters lack of motivation, lack of healthy relationships, lack of healthy therapeutic relationships, lack of other internal and external resources, serious Axis I comorbidity, serious Axis II comorbidity, poor attachment, and self-destruction. In addition, a set of 5 DID-specific items was constructed. The model is supportive of the current phase-oriented treatment model, emphasizing the strengthening of the therapeutic relationship and the patient's resources in the initial stabilization phase. Further research is needed to test the model's statistical and clinical validity.
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar
2016-03-01
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.
Teaching MBA Statistics Online: A Pedagogically Sound Process Approach
ERIC Educational Resources Information Center
Grandzol, John R.
2004-01-01
Delivering MBA statistics in the online environment presents significant challenges to education and students alike because of varying student preparedness levels, complexity of content, difficulty in assessing learning outcomes, and faculty availability and technological expertise. In this article, the author suggests a process model that…
NASA Astrophysics Data System (ADS)
Bouhaj, M.; von Estorff, O.; Peiffer, A.
2017-09-01
In the application of Statistical Energy Analysis "SEA" to complex assembled structures, a purely predictive model often exhibits errors. These errors are mainly due to a lack of accurate modelling of the power transmission mechanism described through the Coupling Loss Factors (CLF). Experimental SEA (ESEA) is practically used by the automotive and aerospace industry to verify and update the model or to derive the CLFs for use in an SEA predictive model when analytical estimates cannot be made. This work is particularly motivated by the lack of procedures that allow an estimate to be made of the variance and confidence intervals of the statistical quantities when using the ESEA technique. The aim of this paper is to introduce procedures enabling a statistical description of measured power input, vibration energies and the derived SEA parameters. Particular emphasis is placed on the identification of structural CLFs of complex built-up structures comparing different methods. By adopting a Stochastic Energy Model (SEM), the ensemble average in ESEA is also addressed. For this purpose, expressions are obtained to randomly perturb the energy matrix elements and generate individual samples for the Monte Carlo (MC) technique applied to derive the ensemble averaged CLF. From results of ESEA tests conducted on an aircraft fuselage section, the SEM approach provides a better performance of estimated CLFs compared to classical matrix inversion methods. The expected range of CLF values and the synthesized energy are used as quality criteria of the matrix inversion, allowing to assess critical SEA subsystems, which might require a more refined statistical description of the excitation and the response fields. Moreover, the impact of the variance of the normalized vibration energy on uncertainty of the derived CLFs is outlined.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines
NASA Astrophysics Data System (ADS)
Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.
2016-12-01
Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
NASA Astrophysics Data System (ADS)
Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft
2018-01-01
We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.
A Complex Network Approach to Stylometry
Amancio, Diego Raphael
2015-01-01
Statistical methods have been widely employed to study the fundamental properties of language. In recent years, methods from complex and dynamical systems proved useful to create several language models. Despite the large amount of studies devoted to represent texts with physical models, only a limited number of studies have shown how the properties of the underlying physical systems can be employed to improve the performance of natural language processing tasks. In this paper, I address this problem by devising complex networks methods that are able to improve the performance of current statistical methods. Using a fuzzy classification strategy, I show that the topological properties extracted from texts complement the traditional textual description. In several cases, the performance obtained with hybrid approaches outperformed the results obtained when only traditional or networked methods were used. Because the proposed model is generic, the framework devised here could be straightforwardly used to study similar textual applications where the topology plays a pivotal role in the description of the interacting agents. PMID:26313921
Family Environment and Cognitive Development: Twelve Analytic Models
ERIC Educational Resources Information Center
Walberg, Herbert J.; Marjoribanks, Kevin
1976-01-01
The review indicates that refined measures of the family environment and the use of complex statistical models increase the understanding of the relationships between socioeconomic status, sibling variables, family environment, and cognitive development. (RC)
a Statistical Dynamic Approach to Structural Evolution of Complex Capital Market Systems
NASA Astrophysics Data System (ADS)
Shao, Xiao; Chai, Li H.
As an important part of modern financial systems, capital market has played a crucial role on diverse social resource allocations and economical exchanges. Beyond traditional models and/or theories based on neoclassical economics, considering capital markets as typical complex open systems, this paper attempts to develop a new approach to overcome some shortcomings of the available researches. By defining the generalized entropy of capital market systems, a theoretical model and nonlinear dynamic equation on the operations of capital market are proposed from statistical dynamic perspectives. The US security market from 1995 to 2001 is then simulated and analyzed as a typical case. Some instructive results are discussed and summarized.
Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A
2008-12-01
It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
NASA Astrophysics Data System (ADS)
Donges, Jonathan F.; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik V.; Marwan, Norbert; Dijkstra, Henk A.; Kurths, Jürgen
2015-11-01
We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.
Statistical Accounting for Uncertainty in Modeling Transport in Environmental Systems
Models frequently are used to predict the future extent of ground-water contamination, given estimates of their input parameters and forcing functions. Although models have a well established scientific basis for understanding the interactions between complex phenomena and for g...
NASA Astrophysics Data System (ADS)
Salman Shahid, Syed; Bikson, Marom; Salman, Humaira; Wen, Peng; Ahfock, Tony
2014-06-01
Objectives. Computational methods are increasingly used to optimize transcranial direct current stimulation (tDCS) dose strategies and yet complexities of existing approaches limit their clinical access. Since predictive modelling indicates the relevance of subject/pathology based data and hence the need for subject specific modelling, the incremental clinical value of increasingly complex modelling methods must be balanced against the computational and clinical time and costs. For example, the incorporation of multiple tissue layers and measured diffusion tensor (DTI) based conductivity estimates increase model precision but at the cost of clinical and computational resources. Costs related to such complexities aggregate when considering individual optimization and the myriad of potential montages. Here, rather than considering if additional details change current-flow prediction, we consider when added complexities influence clinical decisions. Approach. Towards developing quantitative and qualitative metrics of value/cost associated with computational model complexity, we considered field distributions generated by two 4 × 1 high-definition montages (m1 = 4 × 1 HD montage with anode at C3 and m2 = 4 × 1 HD montage with anode at C1) and a single conventional (m3 = C3-Fp2) tDCS electrode montage. We evaluated statistical methods, including residual error (RE) and relative difference measure (RDM), to consider the clinical impact and utility of increased complexities, namely the influence of skull, muscle and brain anisotropic conductivities in a volume conductor model. Main results. Anisotropy modulated current-flow in a montage and region dependent manner. However, significant statistical changes, produced within montage by anisotropy, did not change qualitative peak and topographic comparisons across montages. Thus for the examples analysed, clinical decision on which dose to select would not be altered by the omission of anisotropic brain conductivity. Significance. Results illustrate the need to rationally balance the role of model complexity, such as anisotropy in detailed current flow analysis versus value in clinical dose design. However, when extending our analysis to include axonal polarization, the results provide presumably clinically meaningful information. Hence the importance of model complexity may be more relevant with cellular level predictions of neuromodulation.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.
Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-06-16
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis
Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-01-01
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Detection of crossover time scales in multifractal detrended fluctuation analysis
NASA Astrophysics Data System (ADS)
Ge, Erjia; Leung, Yee
2013-04-01
Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952
Scaling in geology: landforms and earthquakes.
Turcotte, D L
1995-01-01
Landforms and earthquakes appear to be extremely complex; yet, there is order in the complexity. Both satisfy fractal statistics in a variety of ways. A basic question is whether the fractal behavior is due to scale invariance or is the signature of a broadly applicable class of physical processes. Both landscape evolution and regional seismicity appear to be examples of self-organized critical phenomena. A variety of statistical models have been proposed to model landforms, including diffusion-limited aggregation, self-avoiding percolation, and cellular automata. Many authors have studied the behavior of multiple slider-block models, both in terms of the rupture of a fault to generate an earthquake and in terms of the interactions between faults associated with regional seismicity. The slider-block models exhibit a remarkably rich spectrum of behavior; two slider blocks can exhibit low-order chaotic behavior. Large numbers of slider blocks clearly exhibit self-organized critical behavior. Images Fig. 6 PMID:11607562
Specifying and Refining a Complex Measurement Model.
ERIC Educational Resources Information Center
Levy, Roy; Mislevy, Robert J.
This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Statistics of multi-look AIRSAR imagery: A comparison of theory with measurements
NASA Technical Reports Server (NTRS)
Lee, J. S.; Hoppel, K. W.; Mango, S. A.
1993-01-01
The intensity and amplitude statistics of SAR images, such as L-Band HH for SEASAT and SIR-B, and C-Band VV for ERS-1 have been extensively investigated for various terrain, ground cover and ocean surfaces. Less well-known are the statistics between multiple channels of polarimetric of interferometric SAR's, especially for the multi-look processed data. In this paper, we investigate the probability density functions (PDF's) of phase differences, the magnitude of complex products and the amplitude ratios, between polarization channels (i.e. HH, HV, and VV) using 1-look and 4-look AIRSAR polarimetric data. Measured histograms are compared with theoretical PDF's which were recently derived based on a complex Gaussian model.
Sparse intervertebral fence composition for 3D cervical vertebra segmentation
NASA Astrophysics Data System (ADS)
Liu, Xinxin; Yang, Jian; Song, Shuang; Cong, Weijian; Jiao, Peifeng; Song, Hong; Ai, Danni; Jiang, Yurong; Wang, Yongtian
2018-06-01
Statistical shape models are capable of extracting shape prior information, and are usually utilized to assist the task of segmentation of medical images. However, such models require large training datasets in the case of multi-object structures, and it also is difficult to achieve satisfactory results for complex shapes. This study proposed a novel statistical model for cervical vertebra segmentation, called sparse intervertebral fence composition (SiFC), which can reconstruct the boundary between adjacent vertebrae by modeling intervertebral fences. The complex shape of the cervical spine is replaced by a simple intervertebral fence, which considerably reduces the difficulty of cervical segmentation. The final segmentation results are obtained by using a 3D active contour deformation model without shape constraint, which substantially enhances the recognition capability of the proposed method for objects with complex shapes. The proposed segmentation framework is tested on a dataset with CT images from 20 patients. A quantitative comparison against corresponding reference vertebral segmentation yields an overall mean absolute surface distance of 0.70 mm and a dice similarity index of 95.47% for cervical vertebral segmentation. The experimental results show that the SiFC method achieves competitive cervical vertebral segmentation performances, and completely eliminates inter-process overlap.
NASA Astrophysics Data System (ADS)
Martinez, Guillermo F.; Gupta, Hoshin V.
2011-12-01
Methods to select parsimonious and hydrologically consistent model structures are useful for evaluating dominance of hydrologic processes and representativeness of data. While information criteria (appropriately constrained to obey underlying statistical assumptions) can provide a basis for evaluating appropriate model complexity, it is not sufficient to rely upon the principle of maximum likelihood (ML) alone. We suggest that one must also call upon a "principle of hydrologic consistency," meaning that selected ML structures and parameter estimates must be constrained (as well as possible) to reproduce desired hydrological characteristics of the processes under investigation. This argument is demonstrated in the context of evaluating the suitability of candidate model structures for lumped water balance modeling across the continental United States, using data from 307 snow-free catchments. The models are constrained to satisfy several tests of hydrologic consistency, a flow space transformation is used to ensure better consistency with underlying statistical assumptions, and information criteria are used to evaluate model complexity relative to the data. The results clearly demonstrate that the principle of consistency provides a sensible basis for guiding selection of model structures and indicate strong spatial persistence of certain model structures across the continental United States. Further work to untangle reasons for model structure predominance can help to relate conceptual model structures to physical characteristics of the catchments, facilitating the task of prediction in ungaged basins.
Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri
2014-01-01
Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.
Zevin, Jason D; Miller, Brett
Reading research is increasingly a multi-disciplinary endeavor involving more complex, team-based science approaches. These approaches offer the potential of capturing the complexity of reading development, the emergence of individual differences in reading performance over time, how these differences relate to the development of reading difficulties and disability, and more fully understanding the nature of skilled reading in adults. This special issue focuses on the potential opportunities and insights that early and richly integrated advanced statistical and computational modeling approaches can provide to our foundational (and translational) understanding of reading. The issue explores how computational and statistical modeling, using both observed and simulated data, can serve as a contact point among research domains and topics, complement other data sources and critically provide analytic advantages over current approaches.
Path statistics, memory, and coarse-graining of continuous-time random walks on networks
Kion-Crosby, Willow; Morozov, Alexandre V.
2015-01-01
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...
Data Mining and Complex Problems: Case Study in Composite Materials
NASA Technical Reports Server (NTRS)
Rabelo, Luis; Marin, Mario
2009-01-01
Data mining is defined as the discovery of useful, possibly unexpected, patterns and relationships in data using statistical and non-statistical techniques in order to develop schemes for decision and policy making. Data mining can be used to discover the sources and causes of problems in complex systems. In addition, data mining can support simulation strategies by finding the different constants and parameters to be used in the development of simulation models. This paper introduces a framework for data mining and its application to complex problems. To further explain some of the concepts outlined in this paper, the potential application to the NASA Shuttle Reinforced Carbon-Carbon structures and genetic programming is used as an illustration.
Methods of Information Geometry to model complex shapes
NASA Astrophysics Data System (ADS)
De Sanctis, A.; Gattone, S. A.
2016-09-01
In this paper, a new statistical method to model patterns emerging in complex systems is proposed. A framework for shape analysis of 2- dimensional landmark data is introduced, in which each landmark is represented by a bivariate Gaussian distribution. From Information Geometry we know that Fisher-Rao metric endows the statistical manifold of parameters of a family of probability distributions with a Riemannian metric. Thus this approach allows to reconstruct the intermediate steps in the evolution between observed shapes by computing the geodesic, with respect to the Fisher-Rao metric, between the corresponding distributions. Furthermore, the geodesic path can be used for shape predictions. As application, we study the evolution of the rat skull shape. A future application in Ophthalmology is introduced.
Soft Mixer Assignment in a Hierarchical Generative Model of Natural Scene Statistics
Schwartz, Odelia; Sejnowski, Terrence J.; Dayan, Peter
2010-01-01
Gaussian scale mixture models offer a top-down description of signal generation that captures key bottom-up statistical characteristics of filter responses to images. However, the pattern of dependence among the filters for this class of models is prespecified. We propose a novel extension to the gaussian scale mixture model that learns the pattern of dependence from observed inputs and thereby induces a hierarchical representation of these inputs. Specifically, we propose that inputs are generated by gaussian variables (modeling local filter structure), multiplied by a mixer variable that is assigned probabilistically to each input from a set of possible mixers. We demonstrate inference of both components of the generative model, for synthesized data and for different classes of natural images, such as a generic ensemble and faces. For natural images, the mixer variable assignments show invariances resembling those of complex cells in visual cortex; the statistics of the gaussian components of the model are in accord with the outputs of divisive normalization models. We also show how our model helps interrelate a wide range of models of image statistics and cortical processing. PMID:16999575
NASA Astrophysics Data System (ADS)
Friedel, M. J.; Daughney, C.
2016-12-01
The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.
The additivity model assumed that field-scale reaction properties in a sediment including surface area, reactive site concentration, and reaction rate can be predicted from field-scale grain-size distribution by linearly adding reaction properties estimated in laboratory for individual grain-size fractions. This study evaluated the additivity model in scaling mass transfer-limited, multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment. Experimental data of rate-limited U(VI) desorption in a stirred flow-cell reactor were used to estimate the statistical properties of the rate constants for individual grain-size fractions, which were then used to predict rate-limited U(VI) desorption in the composite sediment. The resultmore » indicated that the additivity model with respect to the rate of U(VI) desorption provided a good prediction of U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel-size fraction (2 to 8 mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less
Statistical Field Estimation for Complex Coastal Regions and Archipelagos (PREPRINT)
2011-04-09
and study the computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal...computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal regions and... multiscale free-surface code builds on the primitive-equation model of the Harvard Ocean Predic- tion System (HOPS, Haley et al. (2009)). Additionally
Reducing the Complexity of an Agent-Based Local Heroin Market Model
Heard, Daniel; Bobashev, Georgiy V.; Morris, Robert J.
2014-01-01
This project explores techniques for reducing the complexity of an agent-based model (ABM). The analysis involved a model developed from the ethnographic research of Dr. Lee Hoffer in the Larimer area heroin market, which involved drug users, drug sellers, homeless individuals and police. The authors used statistical techniques to create a reduced version of the original model which maintained simulation fidelity while reducing computational complexity. This involved identifying key summary quantities of individual customer behavior as well as overall market activity and replacing some agents with probability distributions and regressions. The model was then extended to allow external market interventions in the form of police busts. Extensions of this research perspective, as well as its strengths and limitations, are discussed. PMID:25025132
Modeling Smoke Plume-Rise and Dispersion from Southern United States Prescribed Burns with Daysmoke
G L Achtemeier; S L Goodrick; Y Liu; F Garcia-Menendez; Y Hu; M. Odman
2011-01-01
We present Daysmoke, an empirical-statistical plume rise and dispersion model for simulating smoke from prescribed burns. Prescribed fires are characterized by complex plume structure including multiple-core updrafts which makes modeling with simple plume models difficult. Daysmoke accounts for plume structure in a three-dimensional veering/sheering atmospheric...
Statistical methodology for the analysis of dye-switch microarray experiments
Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques
2008-01-01
Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965
Complex groundwater flow systems as traveling agent models
Padilla, Pablo; Escolero, Oscar; González, Tomas; Morales-Casique, Eric; Osorio-Olvera, Luis
2014-01-01
Analyzing field data from pumping tests, we show that as with many other natural phenomena, groundwater flow exhibits complex dynamics described by 1/f power spectrum. This result is theoretically studied within an agent perspective. Using a traveling agent model, we prove that this statistical behavior emerges when the medium is complex. Some heuristic reasoning is provided to justify both spatial and dynamic complexity, as the result of the superposition of an infinite number of stochastic processes. Even more, we show that this implies that non-Kolmogorovian probability is needed for its study, and provide a set of new partial differential equations for groundwater flow. PMID:25337455
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Averaging Models: Parameters Estimation with the R-Average Procedure
ERIC Educational Resources Information Center
Vidotto, G.; Massidda, D.; Noventa, S.
2010-01-01
The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…
Unifying Complexity and Information
NASA Astrophysics Data System (ADS)
Ke, Da-Guan
2013-04-01
Complex systems, arising in many contexts in the computer, life, social, and physical sciences, have not shared a generally-accepted complexity measure playing a fundamental role as the Shannon entropy H in statistical mechanics. Superficially-conflicting criteria of complexity measurement, i.e. complexity-randomness (C-R) relations, have given rise to a special measure intrinsically adaptable to more than one criterion. However, deep causes of the conflict and the adaptability are not much clear. Here I trace the root of each representative or adaptable measure to its particular universal data-generating or -regenerating model (UDGM or UDRM). A representative measure for deterministic dynamical systems is found as a counterpart of the H for random process, clearly redefining the boundary of different criteria. And a specific UDRM achieving the intrinsic adaptability enables a general information measure that ultimately solves all major disputes. This work encourages a single framework coving deterministic systems, statistical mechanics and real-world living organisms.
Climate and dengue transmission: evidence and implications.
Morin, Cory W; Comrie, Andrew C; Ernst, Kacey
2013-01-01
Climate influences dengue ecology by affecting vector dynamics, agent development, and mosquito/human interactions. Although these relationships are known, the impact climate change will have on transmission is unclear. Climate-driven statistical and process-based models are being used to refine our knowledge of these relationships and predict the effects of projected climate change on dengue fever occurrence, but results have been inconsistent. We sought to identify major climatic influences on dengue virus ecology and to evaluate the ability of climate-based dengue models to describe associations between climate and dengue, simulate outbreaks, and project the impacts of climate change. We reviewed the evidence for direct and indirect relationships between climate and dengue generated from laboratory studies, field studies, and statistical analyses of associations between vectors, dengue fever incidence, and climate conditions. We assessed the potential contribution of climate-driven, process-based dengue models and provide suggestions to improve their performance. Relationships between climate variables and factors that influence dengue transmission are complex. A climate variable may increase dengue transmission potential through one aspect of the system while simultaneously decreasing transmission potential through another. This complexity may at least partly explain inconsistencies in statistical associations between dengue and climate. Process-based models can account for the complex dynamics but often omit important aspects of dengue ecology, notably virus development and host-species interactions. Synthesizing and applying current knowledge of climatic effects on all aspects of dengue virus ecology will help direct future research and enable better projections of climate change effects on dengue incidence.
A model for family-based case-control studies of genetic imprinting and epistasis.
Li, Xin; Sui, Yihan; Liu, Tian; Wang, Jianxin; Li, Yongci; Lin, Zhenwu; Hegarty, John; Koltun, Walter A; Wang, Zuoheng; Wu, Rongling
2014-11-01
Genetic imprinting, or called the parent-of-origin effect, has been recognized to play an important role in the formation and pathogenesis of human diseases. Although the epigenetic mechanisms that establish genetic imprinting have been a focus of many genetic studies, our knowledge about the number of imprinting genes and their chromosomal locations and interactions with other genes is still scarce, limiting precise inference of the genetic architecture of complex diseases. In this article, we present a statistical model for testing and estimating the effects of genetic imprinting on complex diseases using a commonly used case-control design with family structure. For each subject sampled from a case and control population, we not only genotype its own single nucleotide polymorphisms (SNPs) but also collect its parents' genotypes. By tracing the transmission pattern of SNP alleles from parental to offspring generation, the model allows the characterization of genetic imprinting effects based on Pearson tests of a 2 × 2 contingency table. The model is expanded to test the interactions between imprinting effects and additive, dominant and epistatic effects in a complex web of genetic interactions. Statistical properties of the model are investigated, and its practical usefulness is validated by a real data analysis. The model will provide a useful tool for genome-wide association studies aimed to elucidate the picture of genetic control over complex human diseases. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Narayan, Manjari; Allen, Genevera I.
2016-01-01
Many complex brain disorders, such as autism spectrum disorders, exhibit a wide range of symptoms and disability. To understand how brain communication is impaired in such conditions, functional connectivity studies seek to understand individual differences in brain network structure in terms of covariates that measure symptom severity. In practice, however, functional connectivity is not observed but estimated from complex and noisy neural activity measurements. Imperfect subject network estimates can compromise subsequent efforts to detect covariate effects on network structure. We address this problem in the case of Gaussian graphical models of functional connectivity, by proposing novel two-level models that treat both subject level networks and population level covariate effects as unknown parameters. To account for imperfectly estimated subject level networks when fitting these models, we propose two related approaches—R2 based on resampling and random effects test statistics, and R3 that additionally employs random adaptive penalization. Simulation studies using realistic graph structures reveal that R2 and R3 have superior statistical power to detect covariate effects compared to existing approaches, particularly when the number of within subject observations is comparable to the size of subject networks. Using our novel models and methods to study parts of the ABIDE dataset, we find evidence of hypoconnectivity associated with symptom severity in autism spectrum disorders, in frontoparietal and limbic systems as well as in anterior and posterior cingulate cortices. PMID:27147940
Becker, Betsy Jane; Aloe, Ariel M; Duvendack, Maren; Stanley, T D; Valentine, Jeffrey C; Fretheim, Atle; Tugwell, Peter
2017-09-01
To outline issues of importance to analytic approaches to the synthesis of quasi-experiments (QEs) and to provide a statistical model for use in analysis. We drew on studies of statistics, epidemiology, and social-science methodology to outline methods for synthesis of QE studies. The design and conduct of QEs, effect sizes from QEs, and moderator variables for the analysis of those effect sizes were discussed. Biases, confounding, design complexities, and comparisons across designs offer serious challenges to syntheses of QEs. Key components of meta-analyses of QEs were identified, including the aspects of QE study design to be coded and analyzed. Of utmost importance are the design and statistical controls implemented in the QEs. Such controls and any potential sources of bias and confounding must be modeled in analyses, along with aspects of the interventions and populations studied. Because of such controls, effect sizes from QEs are more complex than those from randomized experiments. A statistical meta-regression model that incorporates important features of the QEs under review was presented. Meta-analyses of QEs provide particular challenges, but thorough coding of intervention characteristics and study methods, along with careful analysis, should allow for sound inferences. Copyright © 2017 Elsevier Inc. All rights reserved.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Hoyle, R H
1991-02-01
Indirect measures of psychological constructs are vital to clinical research. On occasion, however, the meaning of indirect measures of psychological constructs is obfuscated by statistical procedures that do not account for the complex relations between items and latent variables and among latent variables. Covariance structure analysis (CSA) is a statistical procedure for testing hypotheses about the relations among items that indirectly measure a psychological construct and relations among psychological constructs. This article introduces clinical researchers to the strengths and limitations of CSA as a statistical procedure for conceiving and testing structural hypotheses that are not tested adequately with other statistical procedures. The article is organized around two empirical examples that illustrate the use of CSA for evaluating measurement models with correlated error terms, higher-order factors, and measured and latent variables.
The role of shape complexity in the detection of closed contours.
Wilder, John; Feldman, Jacob; Singh, Manish
2016-09-01
The detection of contours in noise has been extensively studied, but the detection of closed contours, such as the boundaries of whole objects, has received relatively little attention. Closed contours pose substantial challenges not present in the simple (open) case, because they form the outlines of whole shapes and thus take on a range of potentially important configural properties. In this paper we consider the detection of closed contours in noise as a probabilistic decision problem. Previous work on open contours suggests that contour complexity, quantified as the negative log probability (Description Length, DL) of the contour under a suitably chosen statistical model, impairs contour detectability; more complex (statistically surprising) contours are harder to detect. In this study we extended this result to closed contours, developing a suitable probabilistic model of whole shapes that gives rise to several distinct though interrelated measures of shape complexity. We asked subjects to detect either natural shapes (Exp. 1) or experimentally manipulated shapes (Exp. 2) embedded in noise fields. We found systematic effects of global shape complexity on detection performance, demonstrating how aspects of global shape and form influence the basic process of object detection. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Lomakina, N. Ya.
2017-11-01
The work presents the results of the applied climatic division of the Siberian region into districts based on the methodology of objective classification of the atmospheric boundary layer climates by the "temperature-moisture-wind" complex realized with using the method of principal components and the special similarity criteria of average profiles and the eigen values of correlation matrices. On the territory of Siberia, it was identified 14 homogeneous regions for winter season and 10 regions were revealed for summer. The local statistical models were constructed for each region. These include vertical profiles of mean values, mean square deviations, and matrices of interlevel correlation of temperature, specific humidity, zonal and meridional wind velocity. The advantage of the obtained local statistical models over the regional models is shown.
Computational algebraic geometry for statistical modeling FY09Q2 progress.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, David C.; Rojas, Joseph Maurice; Pebay, Philippe Pierre
2009-03-01
This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones inmore » more detail; the next section provides an overview of the project and how the current progress fits into it.« less
Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos
NASA Astrophysics Data System (ADS)
Ganz, Melanie; Nielsen, Mads; Brandt, Sami
We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.
NASA Astrophysics Data System (ADS)
Yan, Wang-Ji; Ren, Wei-Xin
2018-01-01
This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.
Rohrmeier, Martin A; Cross, Ian
2014-07-01
Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.
Inferring general relations between network characteristics from specific network ensembles.
Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan
2012-01-01
Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.
Spectral Entropies as Information-Theoretic Tools for Complex Network Comparison
NASA Astrophysics Data System (ADS)
De Domenico, Manlio; Biamonte, Jacob
2016-10-01
Any physical system can be viewed from the perspective that information is implicitly represented in its state. However, the quantification of this information when it comes to complex networks has remained largely elusive. In this work, we use techniques inspired by quantum statistical mechanics to define an entropy measure for complex networks and to develop a set of information-theoretic tools, based on network spectral properties, such as Rényi q entropy, generalized Kullback-Leibler and Jensen-Shannon divergences, the latter allowing us to define a natural distance measure between complex networks. First, we show that by minimizing the Kullback-Leibler divergence between an observed network and a parametric network model, inference of model parameter(s) by means of maximum-likelihood estimation can be achieved and model selection can be performed with appropriate information criteria. Second, we show that the information-theoretic metric quantifies the distance between pairs of networks and we can use it, for instance, to cluster the layers of a multilayer system. By applying this framework to networks corresponding to sites of the human microbiome, we perform hierarchical cluster analysis and recover with high accuracy existing community-based associations. Our results imply that spectral-based statistical inference in complex networks results in demonstrably superior performance as well as a conceptual backbone, filling a gap towards a network information theory.
Air-flow distortion and turbulence statistics near an animal facility
NASA Astrophysics Data System (ADS)
Prueger, J. H.; Eichinger, W. E.; Hipps, L. E.; Hatfield, J. L.; Cooper, D. I.
The emission and dispersion of particulates and gases from concentrated animal feeding operations (CAFO) at local to regional scales is a current issue in science and society. The transport of particulates, odors and toxic chemical species from the source into the local and eventually regional atmosphere is largely determined by turbulence. Any models that attempt to simulate the dispersion of particles must either specify or assume various statistical properties of the turbulence field. Statistical properties of turbulence are well documented for idealized boundary layers above uniform surfaces. However, an animal production facility is a complex surface with structures that act as bluff bodies that distort the turbulence intensity near the buildings. As a result, the initial release and subsequent dispersion of effluents in the region near a facility will be affected by the complex nature of the surface. Previous Lidar studies of plume dispersion over the facility used in this study indicated that plumes move in complex yet organized patterns that would not be explained by the properties of turbulence generally assumed in models. The objective of this study was to characterize the near-surface turbulence statistics in the flow field around an array of animal confinement buildings. Eddy covariance towers were erected in the upwind, within the building array and downwind regions of the flow field. Substantial changes in turbulence intensity statistics and turbulence-kinetic energy (TKE) were observed as the mean wind flow encountered the building structures. Spectra analysis demonstrated unique distribution of the spectral energy in the vertical profile above the buildings.
A superstatistical model of metastasis and cancer survival
NASA Astrophysics Data System (ADS)
Leon Chen, L.; Beck, Christian
2008-05-01
We introduce a superstatistical model for the progression statistics of malignant cancer cells. The metastatic cascade is modeled as a complex nonequilibrium system with several macroscopic pathways and inverse-chi-square distributed parameters of the underlying Poisson processes. The predictions of the model are in excellent agreement with observed survival-time probability distributions of breast cancer patients.
NASA Astrophysics Data System (ADS)
Alexandridis, Konstantinos T.
This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.
Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring
Carlos Carroll; Devin S. Johnson; Jeffrey R. Dunk; William J. Zielinski
2010-01-01
Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their dataâs spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and...
Neighbor effect in complexation of a conjugated polymer.
Sosorev, Andrey; Zapunidi, Sergey
2013-09-19
Charge-transfer complex (CTC) formation between a conjugated polymer and low-molecular-weight organic acceptor is proposed to be driven by the neighbor effect. Formation of a CTC on the polymer chain results in an increased probability of new CTC formation near the existing one. We present an analytical model for CTC distribution considering the neighbor effect, based on the principles of statistical mechanics. This model explains the experimentally observed threshold-like dependence of the CTC concentration on the acceptor content in a polymer:acceptor blend. It also allows us to evaluate binding energies of the complexes.
Combining Statistics and Physics to Improve Climate Downscaling
NASA Astrophysics Data System (ADS)
Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.
2017-12-01
Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.
Power Analysis for Complex Mediational Designs Using Monte Carlo Methods
ERIC Educational Resources Information Center
Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.
2010-01-01
Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…
Modeling Conditional Probabilities in Complex Educational Assessments. CSE Technical Report.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Almond, Russell; Dibello, Lou; Jenkins, Frank; Steinberg, Linda; Yan, Duanli; Senturk, Deniz
An active area in psychometric research is coordinated task design and statistical analysis built around cognitive models. Compared with classical test theory and item response theory, there is often less information from observed data about the measurement-model parameters. On the other hand, there is more information from the grounding…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Complex Adaptive System Models and the Genetic Analysis of Plasma HDL-Cholesterol Concentration
Rea, Thomas J.; Brown, Christine M.; Sing, Charles F.
2006-01-01
Despite remarkable advances in diagnosis and therapy, ischemic heart disease (IHD) remains a leading cause of morbidity and mortality in industrialized countries. Recent efforts to estimate the influence of genetic variation on IHD risk have focused on predicting individual plasma high-density lipoprotein cholesterol (HDL-C) concentration. Plasma HDL-C concentration (mg/dl), a quantitative risk factor for IHD, has a complex multifactorial etiology that involves the actions of many genes. Single gene variations may be necessary but are not individually sufficient to predict a statistically significant increase in risk of disease. The complexity of phenotype-genotype-environment relationships involved in determining plasma HDL-C concentration has challenged commonly held assumptions about genetic causation and has led to the question of which combination of variations, in which subset of genes, in which environmental strata of a particular population significantly improves our ability to predict high or low risk phenotypes. We document the limitations of inferences from genetic research based on commonly accepted biological models, consider how evidence for real-world dynamical interactions between HDL-C determinants challenges the simplifying assumptions implicit in traditional linear statistical genetic models, and conclude by considering research options for evaluating the utility of genetic information in predicting traits with complex etiologies. PMID:17146134
Comparison of AERMOD and CALPUFF models for simulating SO2 concentrations in a gas refinery.
Atabi, Farideh; Jafarigol, Farzaneh; Moattar, Faramarz; Nouri, Jafar
2016-09-01
In this study, concentration of SO2 from a gas refinery located in complex terrain was calculated by the steady-state, AERMOD model, and nonsteady-state CALPUFF model. First, in four seasons, SO2 concentrations emitted from 16 refinery stacks, in nine receptors, were obtained by field measurements, and then the performance of both models was evaluated. Then, the simulated results for SO2 ambient concentrations made by each model were compared with the results of the observed concentrations, and model results were compared among themselves. The evaluation of the two models to simulate SO2 concentrations was based on the statistical analysis and Q-Q plots. Review of statistical parameters and Q-Q plots has shown that, according to the evaluation of estimations made, performance of both models to simulate the concentration of SO2 in the region can be considered acceptable. The results showed the AERMOD composite ratio between simulated values made by models and the observed values in various receptors for all four average times is 0.72, whereas CALPUFF's ratio is 0.89. However, in the complex conditions of topography, CALPUFF offers better agreement with the observed concentrations.
Statistical Physics of Cascading Failures in Complex Networks
NASA Astrophysics Data System (ADS)
Panduranga, Nagendra Kumar
Systems such as the power grid, world wide web (WWW), and internet are categorized as complex systems because of the presence of a large number of interacting elements. For example, the WWW is estimated to have a billion webpages and understanding the dynamics of such a large number of individual agents (whose individual interactions might not be fully known) is a challenging task. Complex network representations of these systems have proved to be of great utility. Statistical physics is the study of emergence of macroscopic properties of systems from the characteristics of the interactions between individual molecules. Hence, statistical physics of complex networks has been an effective approach to study these systems. In this dissertation, I have used statistical physics to study two distinct phenomena in complex systems: i) Cascading failures and ii) Shortest paths in complex networks. Understanding cascading failures is considered to be one of the "holy grails" in the study of complex systems such as the power grid, transportation networks, and economic systems. Studying failures of these systems as percolation on complex networks has proved to be insightful. Previously, cascading failures have been studied extensively using two different models: k-core percolation and interdependent networks. The first part of this work combines the two models into a general model, solves it analytically, and validates the theoretical predictions through extensive computer simulations. The phase diagram of the percolation transition has been systematically studied as one varies the average local k-core threshold and the coupling between networks. The phase diagram of the combined processes is very rich and includes novel features that do not appear in the models which study each of the processes separately. For example, the phase diagram consists of first- and second-order transition regions separated by two tricritical lines that merge together and enclose a two-stage transition region. In the two-stage transition, the size of the giant component undergoes a first-order jump at a certain occupation probability followed by a continuous second-order transition at a smaller occupation probability. Furthermore, at certain fixed interdependencies, the percolation transition cycles from first-order to second-order to two-stage to first-order as the k-core threshold is increased. We setup the analytical equations describing the phase boundaries of the two-stage transition region and we derive the critical exponents for each type of transition. Understanding the shortest paths between individual elements in systems like communication networks and social media networks is important in the study of information cascades in these systems. Often, large heterogeneity can be present in the connections between nodes in these networks. Certain sets of nodes can be more highly connected among themselves than with the nodes from other sets. These sets of nodes are often referred to as 'communities'. The second part of this work studies the effect of the presence of communities on the distribution of shortest paths in a network using a modular Erdős-Renyi network model. In this model, the number of communities and the degree of modularity of the network can be tuned using the parameters of the model. We find that the model reaches a percolation threshold while tuning the degree of modularity of the network and the distribution of the shortest paths in the network can be used as an indicator of how the communities are connected.
Multilevel Modeling for Research in Group Work
ERIC Educational Resources Information Center
Selig, James P.; Trott, Arianna; Lemberger, Matthew E.
2017-01-01
Researchers in group counseling often encounter complex data from individual clients who are members of a group. Clients in the same group may be more similar than clients from different groups and this can lead to violations of statistical assumptions. The complexity of the data also means that predictors and outcomes can be measured at both the…
The Problem of Auto-Correlation in Parasitology
Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick
2012-01-01
Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
A novel approach to simulate gene-environment interactions in complex diseases.
Amato, Roberto; Pinelli, Michele; D'Andrea, Daniel; Miele, Gennaro; Nicodemi, Mario; Raiconi, Giancarlo; Cocozza, Sergio
2010-01-05
Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.
Artificial neural network study on organ-targeting peptides
NASA Astrophysics Data System (ADS)
Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun
2010-01-01
We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.
NASA Astrophysics Data System (ADS)
Ozrin, V. D.; Subbotin, M. V.; Nikitin, S. M.
2004-04-01
We have developed PLASS (Protein-Ligand Affinity Statistical Score), a pair-wise potential of mean-force for rapid estimation of the binding affinity of a ligand molecule to a protein active site. This scoring function is derived from the frequency of occurrence of atom-type pairs in crystallographic complexes taken from the Protein Data Bank (PDB). Statistical distributions are converted into distance-dependent contributions to the Gibbs free interaction energy for 10 atomic types using the Boltzmann hypothesis, with only one adjustable parameter. For a representative set of 72 protein-ligand structures, PLASS scores correlate well with the experimentally measured dissociation constants: a correlation coefficient R of 0.82 and RMS error of 2.0 kcal/mol. Such high accuracy results from our novel treatment of the volume correction term, which takes into account the inhomogeneous properties of the protein-ligand complexes. PLASS is able to rank reliably the affinity of complexes which have as much diversity as in the PDB.
Cavalcanti, Paulo Ernando Ferraz; Sá, Michel Pompeu Barros de Oliveira; Santos, Cecília Andrade dos; Esmeraldo, Isaac Melo; Chaves, Mariana Leal; Lins, Ricardo Felipe de Albuquerque; Lima, Ricardo de Carvalho
2015-01-01
To determine whether stratification of complexity models in congenital heart surgery (RACHS-1, Aristotle basic score and STS-EACTS mortality score) fit to our center and determine the best method of discriminating hospital mortality. Surgical procedures in congenital heart diseases in patients under 18 years of age were allocated to the categories proposed by the stratification of complexity methods currently available. The outcome hospital mortality was calculated for each category from the three models. Statistical analysis was performed to verify whether the categories presented different mortalities. The discriminatory ability of the models was determined by calculating the area under the ROC curve and a comparison between the curves of the three models was performed. 360 patients were allocated according to the three methods. There was a statistically significant difference between the mortality categories: RACHS-1 (1) - 1.3%, (2) - 11.4%, (3)-27.3%, (4) - 50 %, (P<0.001); Aristotle basic score (1) - 1.1%, (2) - 12.2%, (3) - 34%, (4) - 64.7%, (P<0.001); and STS-EACTS mortality score (1) - 5.5 %, (2) - 13.6%, (3) - 18.7%, (4) - 35.8%, (P<0.001). The three models had similar accuracy by calculating the area under the ROC curve: RACHS-1- 0.738; STS-EACTS-0.739; Aristotle- 0.766. The three models of stratification of complexity currently available in the literature are useful with different mortalities between the proposed categories with similar discriminatory capacity for hospital mortality.
Network model of bilateral power markets based on complex networks
NASA Astrophysics Data System (ADS)
Wu, Yang; Liu, Junyong; Li, Furong; Yan, Zhanxin; Zhang, Li
2014-06-01
The bilateral power transaction (BPT) mode becomes a typical market organization with the restructuring of electric power industry, the proper model which could capture its characteristics is in urgent need. However, the model is lacking because of this market organization's complexity. As a promising approach to modeling complex systems, complex networks could provide a sound theoretical framework for developing proper simulation model. In this paper, a complex network model of the BPT market is proposed. In this model, price advantage mechanism is a precondition. Unlike other general commodity transactions, both of the financial layer and the physical layer are considered in the model. Through simulation analysis, the feasibility and validity of the model are verified. At same time, some typical statistical features of BPT network are identified. Namely, the degree distribution follows the power law, the clustering coefficient is low and the average path length is a bit long. Moreover, the topological stability of the BPT network is tested. The results show that the network displays a topological robustness to random market member's failures while it is fragile against deliberate attacks, and the network could resist cascading failure to some extent. These features are helpful for making decisions and risk management in BPT markets.
ERIC Educational Resources Information Center
Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver
2012-01-01
Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item-fit statistics for correct and misspecified diagnostic classification models within a log-linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3…
NASA Astrophysics Data System (ADS)
Rusu-Anghel, S.
2017-01-01
Analytical modeling of the flow of manufacturing process of the cement is difficult because of their complexity and has not resulted in sufficiently precise mathematical models. In this paper, based on a statistical model of the process and using the knowledge of human experts, was designed a fuzzy system for automatic control of clinkering process.
Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu
2017-12-07
Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Hunt, R.J.; Anderson, M.P.; Kelson, V.A.
1998-01-01
This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.
Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca
2002-09-01
In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
Exploiting Complexity Information for Brain Activation Detection
Zhang, Yan; Liang, Jiali; Lin, Qiang; Hu, Zhenghui
2016-01-01
We present a complexity-based approach for the analysis of fMRI time series, in which sample entropy (SampEn) is introduced as a quantification of the voxel complexity. Under this hypothesis the voxel complexity could be modulated in pertinent cognitive tasks, and it changes through experimental paradigms. We calculate the complexity of sequential fMRI data for each voxel in two distinct experimental paradigms and use a nonparametric statistical strategy, the Wilcoxon signed rank test, to evaluate the difference in complexity between them. The results are compared with the well known general linear model based Statistical Parametric Mapping package (SPM12), where a decided difference has been observed. This is because SampEn method detects brain complexity changes in two experiments of different conditions and the data-driven method SampEn evaluates just the complexity of specific sequential fMRI data. Also, the larger and smaller SampEn values correspond to different meanings, and the neutral-blank design produces higher predictability than threat-neutral. Complexity information can be considered as a complementary method to the existing fMRI analysis strategies, and it may help improving the understanding of human brain functions from a different perspective. PMID:27045838
Statistical label fusion with hierarchical performance models
Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.
2014-01-01
Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809
ERIC Educational Resources Information Center
Cheung, Mike W. L.; Chan, Wai
2009-01-01
Structural equation modeling (SEM) is widely used as a statistical framework to test complex models in behavioral and social sciences. When the number of publications increases, there is a need to systematically synthesize them. Methodology of synthesizing findings in the context of SEM is known as meta-analytic SEM (MASEM). Although correlation…
Analyzing the causation of a railway accident based on a complex network
NASA Astrophysics Data System (ADS)
Ma, Xin; Li, Ke-Ping; Luo, Zi-Yan; Zhou, Jin
2014-02-01
In this paper, a new model is constructed for the causation analysis of railway accident based on the complex network theory. In the model, the nodes are defined as various manifest or latent accident causal factors. By employing the complex network theory, especially its statistical indicators, the railway accident as well as its key causations can be analyzed from the overall perspective. As a case, the “7.23” China—Yongwen railway accident is illustrated based on this model. The results show that the inspection of signals and the checking of line conditions before trains run played an important role in this railway accident. In conclusion, the constructed model gives a theoretical clue for railway accident prediction and, hence, greatly reduces the occurrence of railway accidents.
NASA Astrophysics Data System (ADS)
Jajcay, N.; Kravtsov, S.; Tsonis, A.; Palus, M.
2017-12-01
A better understanding of dynamics in complex systems, such as the Earth's climate is one of the key challenges for contemporary science and society. A large amount of experimental data requires new mathematical and computational approaches. Natural complex systems vary on many temporal and spatial scales, often exhibiting recurring patterns and quasi-oscillatory phenomena. The statistical inference of causal interactions and synchronization between dynamical phenomena evolving on different temporal scales is of vital importance for better understanding of underlying mechanisms and a key for modeling and prediction of such systems. This study introduces and applies information theory diagnostics to phase and amplitude time series of different wavelet components of the observed data that characterizes El Niño. A suite of significant interactions between processes operating on different time scales was detected, and intermittent synchronization among different time scales has been associated with the extreme El Niño events. The mechanisms of these nonlinear interactions were further studied in conceptual low-order and state-of-the-art dynamical, as well as statistical climate models. Observed and simulated interactions exhibit substantial discrepancies, whose understanding may be the key to an improved prediction. Moreover, the statistical framework which we apply here is suitable for direct usage of inferring cross-scale interactions in nonlinear time series from complex systems such as the terrestrial magnetosphere, solar-terrestrial interactions, seismic activity or even human brain dynamics.
Complex network theory for the identification and assessment of candidate protein targets.
McGarry, Ken; McDonald, Sharon
2018-06-01
In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.
Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs
Irvine, Kathryn M.; Rodhouse, Thomas J.
2014-01-01
As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.
Virtual Model Validation of Complex Multiscale Systems: Applications to Nonlinear Elastostatics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oden, John Tinsley; Prudencio, Ernest E.; Bauman, Paul T.
We propose a virtual statistical validation process as an aid to the design of experiments for the validation of phenomenological models of the behavior of material bodies, with focus on those cases in which knowledge of the fabrication process used to manufacture the body can provide information on the micro-molecular-scale properties underlying macroscale behavior. One example is given by models of elastomeric solids fabricated using polymerization processes. We describe a framework for model validation that involves Bayesian updates of parameters in statistical calibration and validation phases. The process enables the quanti cation of uncertainty in quantities of interest (QoIs) andmore » the determination of model consistency using tools of statistical information theory. We assert that microscale information drawn from molecular models of the fabrication of the body provides a valuable source of prior information on parameters as well as a means for estimating model bias and designing virtual validation experiments to provide information gain over calibration posteriors.« less
Avoiding the ensemble decorrelation problem using member-by-member post-processing
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2014-05-01
Forecast calibration or post-processing has become a standard tool in atmospheric and climatological science due to the presence of systematic initial condition and model errors. For ensemble forecasts the most competitive methods derive from the assumption of a fixed ensemble distribution. However, when independently applying such 'statistical' methods at different locations, lead times or for multiple variables the correlation structure for individual ensemble members is destroyed. Instead of reastablishing the correlation structure as in Schefzik et al. (2013) we instead propose a calibration method that avoids such problem by correcting each ensemble member individually. Moreover, we analyse the fundamental mechanisms by which the probabilistic ensemble skill can be enhanced. In terms of continuous ranked probability score, our member-by-member approach amounts to skill gain that extends for lead times far beyond the error doubling time and which is as good as the one of the most competitive statistical approach, non-homogeneous Gaussian regression (Gneiting et al. 2005). Besides the conservation of correlation structure, additional benefits arise including the fact that higher-order ensemble moments like kurtosis and skewness are inherited from the uncorrected forecasts. Our detailed analysis is performed in the context of the Kuramoto-Sivashinsky equation and different simple models but the results extent succesfully to the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (Van Schaeybroeck and Vannitsem, 2013, 2014) . References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Schefzik, R., T.L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. To appear in Statistical Science 28. [3] Van Schaeybroeck, B., and S. Vannitsem, 2013: Reliable probabilities through statistical post-processing of ensemble forecasts. Proceedings of the European Conference on Complex Systems 2012, Springer proceedings on complexity, XVI, p. 347-352. [4] Van Schaeybroeck, B., and S. Vannitsem, 2014: Ensemble post-processing using member-by-member approaches: theoretical aspects, under review.
Kernel methods and flexible inference for complex stochastic dynamics
NASA Astrophysics Data System (ADS)
Capobianco, Enrico
2008-07-01
Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.
What do we gain from simplicity versus complexity in species distribution models?
Merow, Cory; Smith, Matthew J.; Edwards, Thomas C.; Guisan, Antoine; McMahon, Sean M.; Normand, Signe; Thuiller, Wilfried; Wuest, Rafael O.; Zimmermann, Niklaus E.; Elith, Jane
2014-01-01
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence–environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence–environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building ‘under fit’ models, having insufficient flexibility to describe observed occurrence–environment relationships, we risk misunderstanding the factors shaping species distributions. By building ‘over fit’ models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Reciprocity in directed networks
NASA Astrophysics Data System (ADS)
Yin, Mei; Zhu, Lingjiong
2016-04-01
Reciprocity is an important characteristic of directed networks and has been widely used in the modeling of World Wide Web, email, social, and other complex networks. In this paper, we take a statistical physics point of view and study the limiting entropy and free energy densities from the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble whose sufficient statistics are given by edge and reciprocal densities. The sparse case is also studied for the grand canonical ensemble. Extensions to more general reciprocal models including reciprocal triangle and star densities will likewise be discussed.
Vernon, Ian; Liu, Junli; Goldstein, Michael; Rowe, James; Topping, Jen; Lindsey, Keith
2018-01-02
Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology. Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty. The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model's structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described. Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour and identifies the sets of rate parameters of interest.
Phenomenological model to fit complex permittivity data of water from radio to optical frequencies.
Shubitidze, Fridon; Osterberg, Ulf
2007-04-01
A general factorized form of the dielectric function together with a fractional model-based parameter estimation method is used to provide an accurate analytical formula for the complex refractive index in water for the frequency range 10(8)-10(16)Hz . The analytical formula is derived using a combination of a microscopic frequency-dependent rational function for adjusting zeros and poles of the dielectric dispersion together with the macroscopic statistical Fermi-Dirac distribution to provide a description of both the real and imaginary parts of the complex permittivity for water. The Fermi-Dirac distribution allows us to model the dramatic reduction in the imaginary part of the permittivity in the visible window of the water spectrum.
Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.
Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L
2007-01-01
Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.
Analysis and meta-analysis of single-case designs: an introduction.
Shadish, William R
2014-04-01
The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija
2017-12-02
The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.
NASA Astrophysics Data System (ADS)
Havens, Timothy C.; Cummings, Ian; Botts, Jonathan; Summers, Jason E.
2017-05-01
The linear ordered statistic (LOS) is a parameterized ordered statistic (OS) that is a weighted average of a rank-ordered sample. LOS operators are useful generalizations of aggregation as they can represent any linear aggregation, from minimum to maximum, including conventional aggregations, such as mean and median. In the fuzzy logic field, these aggregations are called ordered weighted averages (OWAs). Here, we present a method for learning LOS operators from training data, viz., data for which you know the output of the desired LOS. We then extend the learning process with regularization, such that a lower complexity or sparse LOS can be learned. Hence, we discuss what 'lower complexity' means in this context and how to represent that in the optimization procedure. Finally, we apply our learning methods to the well-known constant-false-alarm-rate (CFAR) detection problem, specifically for the case of background levels modeled by long-tailed distributions, such as the K-distribution. These backgrounds arise in several pertinent imaging problems, including the modeling of clutter in synthetic aperture radar and sonar (SAR and SAS) and in wireless communications.
Functional constraints on tooth morphology in carnivorous mammals
2012-01-01
Background The range of potential morphologies resulting from evolution is limited by complex interacting processes, ranging from development to function. Quantifying these interactions is important for understanding adaptation and convergent evolution. Using three-dimensional reconstructions of carnivoran and dasyuromorph tooth rows, we compared statistical models of the relationship between tooth row shape and the opposing tooth row, a static feature, as well as measures of mandibular motion during chewing (occlusion), which are kinetic features. This is a new approach to quantifying functional integration because we use measures of movement and displacement, such as the amount the mandible translates laterally during occlusion, as opposed to conventional morphological measures, such as mandible length and geometric landmarks. By sampling two distantly related groups of ecologically similar mammals, we study carnivorous mammals in general rather than a specific group of mammals. Results Statistical model comparisons demonstrate that the best performing models always include some measure of mandibular motion, indicating that functional and statistical models of tooth shape as purely a function of the opposing tooth row are too simple and that increased model complexity provides a better understanding of tooth form. The predictors of the best performing models always included the opposing tooth row shape and a relative linear measure of mandibular motion. Conclusions Our results provide quantitative support of long-standing hypotheses of tooth row shape as being influenced by mandibular motion in addition to the opposing tooth row. Additionally, this study illustrates the utility and necessity of including kinetic features in analyses of morphological integration. PMID:22899809
NASA Astrophysics Data System (ADS)
Kolokythas, Kostantinos; Vasileios, Salamalikis; Athanassios, Argiriou; Kazantzidis, Andreas
2015-04-01
The wind is a result of complex interactions of numerous mechanisms taking place in small or large scales, so, the better knowledge of its behavior is essential in a variety of applications, especially in the field of power production coming from wind turbines. In the literature there is a considerable number of models, either physical or statistical ones, dealing with the problem of simulation and prediction of wind speed. Among others, Artificial Neural Networks (ANNs) are widely used for the purpose of wind forecasting and, in the great majority of cases, outperform other conventional statistical models. In this study, a number of ANNs with different architectures, which have been created and applied in a dataset of wind time series, are compared to Auto Regressive Integrated Moving Average (ARIMA) statistical models. The data consist of mean hourly wind speeds coming from a wind farm on a hilly Greek region and cover a period of one year (2013). The main goal is to evaluate the models ability to simulate successfully the wind speed at a significant point (target). Goodness-of-fit statistics are performed for the comparison of the different methods. In general, the ANN showed the best performance in the estimation of wind speed prevailing over the ARIMA models.
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
NASA Astrophysics Data System (ADS)
Sandfeld, Stefan; Budrikis, Zoe; Zapperi, Stefano; Fernandez Castellanos, David
2015-02-01
Crystalline plasticity is strongly interlinked with dislocation mechanics and nowadays is relatively well understood. Concepts and physical models of plastic deformation in amorphous materials on the other hand—where the concept of linear lattice defects is not applicable—still are lagging behind. We introduce an eigenstrain-based finite element lattice model for simulations of shear band formation and strain avalanches. Our model allows us to study the influence of surfaces and finite size effects on the statistics of avalanches. We find that even with relatively complex loading conditions and open boundary conditions, critical exponents describing avalanche statistics are unchanged, which validates the use of simpler scalar lattice-based models to study these phenomena.
Symmetry Transition Preserving Chirality in QCD: A Versatile Random Matrix Model
NASA Astrophysics Data System (ADS)
Kanazawa, Takuya; Kieburg, Mario
2018-06-01
We consider a random matrix model which interpolates between the chiral Gaussian unitary ensemble and the Gaussian unitary ensemble while preserving chiral symmetry. This ensemble describes flavor symmetry breaking for staggered fermions in 3D QCD as well as in 4D QCD at high temperature or in 3D QCD at a finite isospin chemical potential. Our model is an Osborn-type two-matrix model which is equivalent to the elliptic ensemble but we consider the singular value statistics rather than the complex eigenvalue statistics. We report on exact results for the partition function and the microscopic level density of the Dirac operator in the ɛ regime of QCD. We compare these analytical results with Monte Carlo simulations of the matrix model.
The concept and use of elasticity in population viability models [Exercise 13
Carolyn Hull Sieg; Rudy M. King; Fred Van Dyke
2003-01-01
As you have seen in exercise 12, plants, such as the western prairie fringed orchid, typically have distinct life stages and complex life cycles that require the matrix analyses associated with a stage-based population model. Some statistics that can be generated from such matrix analyses can be very informative in determining which variables in the model have the...
Building out a Measurement Model to Incorporate Complexities of Testing in the Language Domain
ERIC Educational Resources Information Center
Wilson, Mark; Moore, Stephen
2011-01-01
This paper provides a summary of a novel and integrated way to think about the item response models (most often used in measurement applications in social science areas such as psychology, education, and especially testing of various kinds) from the viewpoint of the statistical theory of generalized linear and nonlinear mixed models. In addition,…
Graphene growth process modeling: a physical-statistical approach
NASA Astrophysics Data System (ADS)
Wu, Jian; Huang, Qiang
2014-09-01
As a zero-band semiconductor, graphene is an attractive material for a wide variety of applications such as optoelectronics. Among various techniques developed for graphene synthesis, chemical vapor deposition on copper foils shows high potential for producing few-layer and large-area graphene. Since fabrication of high-quality graphene sheets requires the understanding of growth mechanisms, and methods of characterization and control of grain size of graphene flakes, analytical modeling of graphene growth process is therefore essential for controlled fabrication. The graphene growth process starts with randomly nucleated islands that gradually develop into complex shapes, grow in size, and eventually connect together to cover the copper foil. To model this complex process, we develop a physical-statistical approach under the assumption of self-similarity during graphene growth. The growth kinetics is uncovered by separating island shapes from area growth rate. We propose to characterize the area growth velocity using a confined exponential model, which not only has clear physical explanation, but also fits the real data well. For the shape modeling, we develop a parametric shape model which can be well explained by the angular-dependent growth rate. This work can provide useful information for the control and optimization of graphene growth process on Cu foil.
ERIC Educational Resources Information Center
Goldstein, Harvey; Bonnet, Gerard; Rocher, Thierry
2007-01-01
The Programme for International Student Assessment comparative study of reading performance among 15-year-olds is reanalyzed using statistical procedures that allow the full complexity of the data structures to be explored. The article extends existing multilevel factor analysis and structural equation models and shows how this can extract richer…
Accelerated failure time models for semi-competing risks data in the presence of complex censoring.
Lee, Kyu Ha; Rondeau, Virginie; Haneuse, Sebastien
2017-12-01
Statistical analyses that investigate risk factors for Alzheimer's disease (AD) are often subject to a number of challenges. Some of these challenges arise due to practical considerations regarding data collection such that the observation of AD events is subject to complex censoring including left-truncation and either interval or right-censoring. Additional challenges arise due to the fact that study participants under investigation are often subject to competing forces, most notably death, that may not be independent of AD. Towards resolving the latter, researchers may choose to embed the study of AD within the "semi-competing risks" framework for which the recent statistical literature has seen a number of advances including for the so-called illness-death model. To the best of our knowledge, however, the semi-competing risks literature has not fully considered analyses in contexts with complex censoring, as in studies of AD. This is particularly the case when interest lies with the accelerated failure time (AFT) model, an alternative to the traditional multiplicative Cox model that places emphasis away from the hazard function. In this article, we outline a new Bayesian framework for estimation/inference of an AFT illness-death model for semi-competing risks data subject to complex censoring. An efficient computational algorithm that gives researchers the flexibility to adopt either a fully parametric or a semi-parametric model specification is developed and implemented. The proposed methods are motivated by and illustrated with an analysis of data from the Adult Changes in Thought study, an on-going community-based prospective study of incident AD in western Washington State. © 2017, The International Biometric Society.
Cavalcanti, Paulo Ernando Ferraz; Sá, Michel Pompeu Barros de Oliveira; dos Santos, Cecília Andrade; Esmeraldo, Isaac Melo; Chaves, Mariana Leal; Lins, Ricardo Felipe de Albuquerque; Lima, Ricardo de Carvalho
2015-01-01
Objective To determine whether stratification of complexity models in congenital heart surgery (RACHS-1, Aristotle basic score and STS-EACTS mortality score) fit to our center and determine the best method of discriminating hospital mortality. Methods Surgical procedures in congenital heart diseases in patients under 18 years of age were allocated to the categories proposed by the stratification of complexity methods currently available. The outcome hospital mortality was calculated for each category from the three models. Statistical analysis was performed to verify whether the categories presented different mortalities. The discriminatory ability of the models was determined by calculating the area under the ROC curve and a comparison between the curves of the three models was performed. Results 360 patients were allocated according to the three methods. There was a statistically significant difference between the mortality categories: RACHS-1 (1) - 1.3%, (2) - 11.4%, (3)-27.3%, (4) - 50 %, (P<0.001); Aristotle basic score (1) - 1.1%, (2) - 12.2%, (3) - 34%, (4) - 64.7%, (P<0.001); and STS-EACTS mortality score (1) - 5.5 %, (2) - 13.6%, (3) - 18.7%, (4) - 35.8%, (P<0.001). The three models had similar accuracy by calculating the area under the ROC curve: RACHS-1- 0.738; STS-EACTS-0.739; Aristotle- 0.766. Conclusion The three models of stratification of complexity currently available in the literature are useful with different mortalities between the proposed categories with similar discriminatory capacity for hospital mortality. PMID:26107445
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Statistical complexity measure of pseudorandom bit generators
NASA Astrophysics Data System (ADS)
González, C. M.; Larrondo, H. A.; Rosso, O. A.
2005-08-01
Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu
2017-02-01
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
Gender Integration on U.S. Navy Submarines: Views of the First Wave
2015-06-01
Radiological Controls Assistant DACOWITS Defense Advisory Committee on Women in the Services DH Department Head DINQ delinquent DO Division...Previous studies have attempted to build statistical models based on surface fleet data to forecast female sustainability in the submarine fleet, yet 2...their integration? Such questions cannot be answered by collecting the type of quantitative data that can be analyzed using statistical methods. Complex
Modeling Complex Phenomena Using Multiscale Time Sequences
2009-08-24
measures based on Hurst and Holder exponents , auto-regressive methods and Fourier and wavelet decomposition methods. The applications for this technology...relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and Holder exponents , auto-regressive...different scales and how these scales relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and
Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H.
2001-01-01
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease. PMID:11404819
Statistics and Informatics in Space Astrophysics
NASA Astrophysics Data System (ADS)
Feigelson, E.
2017-12-01
The interest in statistical and computational methodology has seen rapid growth in space-based astrophysics, parallel to the growth seen in Earth remote sensing. There is widespread agreement that scientific interpretation of the cosmic microwave background, discovery of exoplanets, and classifying multiwavelength surveys is too complex to be accomplished with traditional techniques. NASA operates several well-functioning Science Archive Research Centers providing 0.5 PBy datasets to the research community. These databases are integrated with full-text journal articles in the NASA Astrophysics Data System (200K pageviews/day). Data products use interoperable formats and protocols established by the International Virtual Observatory Alliance. NASA supercomputers also support complex astrophysical models of systems such as accretion disks and planet formation. Academic researcher interest in methodology has significantly grown in areas such as Bayesian inference and machine learning, and statistical research is underway to treat problems such as irregularly spaced time series and astrophysical model uncertainties. Several scholarly societies have created interest groups in astrostatistics and astroinformatics. Improvements are needed on several fronts. Community education in advanced methodology is not sufficiently rapid to meet the research needs. Statistical procedures within NASA science analysis software are sometimes not optimal, and pipeline development may not use modern software engineering techniques. NASA offers few grant opportunities supporting research in astroinformatics and astrostatistics.
NASA Astrophysics Data System (ADS)
Knuth, K. H.
2001-05-01
We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.
Using iMCFA to Perform the CFA, Multilevel CFA, and Maximum Model for Analyzing Complex Survey Data.
Wu, Jiun-Yu; Lee, Yuan-Hsuan; Lin, John J H
2018-01-01
To construct CFA, MCFA, and maximum MCFA with LISREL v.8 and below, we provide iMCFA (integrated Multilevel Confirmatory Analysis) to examine the potential multilevel factorial structure in the complex survey data. Modeling multilevel structure for complex survey data is complicated because building a multilevel model is not an infallible statistical strategy unless the hypothesized model is close to the real data structure. Methodologists have suggested using different modeling techniques to investigate potential multilevel structure of survey data. Using iMCFA, researchers can visually set the between- and within-level factorial structure to fit MCFA, CFA and/or MAX MCFA models for complex survey data. iMCFA can then yield between- and within-level variance-covariance matrices, calculate intraclass correlations, perform the analyses and generate the outputs for respective models. The summary of the analytical outputs from LISREL is gathered and tabulated for further model comparison and interpretation. iMCFA also provides LISREL syntax of different models for researchers' future use. An empirical and a simulated multilevel dataset with complex and simple structures in the within or between level was used to illustrate the usability and the effectiveness of the iMCFA procedure on analyzing complex survey data. The analytic results of iMCFA using Muthen's limited information estimator were compared with those of Mplus using Full Information Maximum Likelihood regarding the effectiveness of different estimation methods.
Statistical Model of Dynamic Markers of the Alzheimer's Pathological Cascade.
Balsis, Steve; Geraci, Lisa; Benge, Jared; Lowe, Deborah A; Choudhury, Tabina K; Tirso, Robert; Doody, Rachelle S
2018-05-05
Alzheimer's disease (AD) is a progressive disease reflected in markers across assessment modalities, including neuroimaging, cognitive testing, and evaluation of adaptive function. Identifying a single continuum of decline across assessment modalities in a single sample is statistically challenging because of the multivariate nature of the data. To address this challenge, we implemented advanced statistical analyses designed specifically to model complex data across a single continuum. We analyzed data from the Alzheimer's Disease Neuroimaging Initiative (ADNI; N = 1,056), focusing on indicators from the assessments of magnetic resonance imaging (MRI) volume, fluorodeoxyglucose positron emission tomography (FDG-PET) metabolic activity, cognitive performance, and adaptive function. Item response theory was used to identify the continuum of decline. Then, through a process of statistical scaling, indicators across all modalities were linked to that continuum and analyzed. Findings revealed that measures of MRI volume, FDG-PET metabolic activity, and adaptive function added measurement precision beyond that provided by cognitive measures, particularly in the relatively mild range of disease severity. More specifically, MRI volume, and FDG-PET metabolic activity become compromised in the very mild range of severity, followed by cognitive performance and finally adaptive function. Our statistically derived models of the AD pathological cascade are consistent with existing theoretical models.
Statistical assessment of the learning curves of health technologies.
Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T
2001-01-01
(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)
Advanced functional network analysis in the geosciences: The pyunicorn package
NASA Astrophysics Data System (ADS)
Donges, Jonathan F.; Heitzig, Jobst; Runge, Jakob; Schultz, Hanna C. H.; Wiedermann, Marc; Zech, Alraune; Feldhoff, Jan; Rheinwalt, Aljoscha; Kutza, Hannes; Radebach, Alexander; Marwan, Norbert; Kurths, Jürgen
2013-04-01
Functional networks are a powerful tool for analyzing large geoscientific datasets such as global fields of climate time series originating from observations or model simulations. pyunicorn (pythonic unified complex network and recurrence analysis toolbox) is an open-source, fully object-oriented and easily parallelizable package written in the language Python. It allows for constructing functional networks (aka climate networks) representing the structure of statistical interrelationships in large datasets and, subsequently, investigating this structure using advanced methods of complex network theory such as measures for networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn allows to study the complex dynamics of geoscientific systems as recorded by time series by means of recurrence networks and visibility graphs. The range of possible applications of the package is outlined drawing on several examples from climatology.
Complex emergence patterns in a bark beetle predator
John D. Reeve
2000-01-01
The emergence pattern of Thanasimus dubius (F.) (Coleoptera: Cleridae), a common predator of the southern pine beetle, Dendroctonus frontalis Zimmermann (Coleoptera: Scolytidae), was studied under field conditions across different seasons. A simple statistical model was then developed...
Reproducible research in vadose zone sciences
USDA-ARS?s Scientific Manuscript database
A significant portion of present-day soil and Earth science research is computational, involving complex data analysis pipelines, advanced mathematical and statistical models, and sophisticated computer codes. Opportunities for scientific progress are greatly diminished if reproducing and building o...
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects
NASA Technical Reports Server (NTRS)
Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian;
2015-01-01
Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex process-based crop models is a rather new idea. We demonstrate herewith that statistical methods can play an important role in analyzing simulated yield data sets obtained from the ensembles of process-based crop models. Formal statistical analysis is helpful to estimate the effects of different climatic variables on yield, and to describe the between-model variability of these effects.
Datamining approaches for modeling tumor control probability.
Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D
2010-11-01
Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.
Statistical strategy for anisotropic adventitia modelling in IVUS.
Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia
2006-06-01
Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
NASA Astrophysics Data System (ADS)
Bae, Minja; Park, Jihyun; Kim, Jongju; Xue, Dandan; Park, Kyu-Chil; Yoon, Jong Rak
2016-07-01
The bit error rate of an underwater acoustic communication system is related to multipath fading statistics, which determine the signal-to-noise ratio. The amplitude and delay of each path depend on sea surface roughness, propagation medium properties, and source-to-receiver range as a function of frequency. Therefore, received signals will show frequency-dependent fading. A shallow-water acoustic communication channel generally shows a few strong multipaths that interfere with each other and the resulting interference affects the fading statistics model. In this study, frequency-selective fading statistics are modeled on the basis of the phasor representation of the complex path amplitude. The fading statistics distribution is parameterized by the frequency-dependent constructive or destructive interference of multipaths. At a 16 m depth with a muddy bottom, a wave height of 0.2 m, and source-to-receiver ranges of 100 and 400 m, fading statistics tend to show a Rayleigh distribution at a destructive interference frequency, but a Rice distribution at a constructive interference frequency. The theoretical fading statistics well matched the experimental ones.
General Blending Models for Data From Mixture Experiments
Brown, L.; Donev, A. N.; Bissett, A. C.
2015-01-01
We propose a new class of models providing a powerful unification and extension of existing statistical methodology for analysis of data obtained in mixture experiments. These models, which integrate models proposed by Scheffé and Becker, extend considerably the range of mixture component effects that may be described. They become complex when the studied phenomenon requires it, but remain simple whenever possible. This article has supplementary material online. PMID:26681812
Pseudo-Boltzmann model for modeling the junctionless transistors
NASA Astrophysics Data System (ADS)
Avila-Herrera, F.; Cerdeira, A.; Roldan, J. B.; Sánchez-Moreno, P.; Tienda-Luna, I. M.; Iñiguez, B.
2014-05-01
Calculation of the carrier concentrations in semiconductors using the Fermi-Dirac integral requires complex numerical calculations; in this context, practically all analytical device models are based on Boltzmann statistics, even though it is known that it leads to an over-estimation of carriers densities for high doping concentrations. In this paper, a new approximation to Fermi-Dirac integral, called Pseudo-Boltzmann model, is presented for modeling junctionless transistors with high doping concentrations.
NASA Astrophysics Data System (ADS)
Vallianatos, F.; Tzanis, A.; Michas, G.; Papadakis, G.
2012-04-01
Since the middle of summer 2011, an increase in the seismicity rates of the volcanic complex system of Santorini Island, Greece, was observed. In the present work, the temporal distribution of seismicity, as well as the magnitude distribution of earthquakes, have been studied using the concept of Non-Extensive Statistical Physics (NESP; Tsallis, 2009) along with the evolution of Shanon entropy H (also called information entropy). The analysis is based on the earthquake catalogue of the Geodynamic Institute of the National Observatory of Athens for the period July 2011-January 2012 (http://www.gein.noa.gr/). Non-Extensive Statistical Physics, which is a generalization of Boltzmann-Gibbs statistical physics, seems a suitable framework for studying complex systems. The observed distributions of seismicity rates at Santorini can be described (fitted) with NESP models to exceptionally well. This implies the inherent complexity of the Santorini volcanic seismicity, the applicability of NESP concepts to volcanic earthquake activity and the usefulness of NESP in investigating phenomena exhibiting multifractality and long-range coupling effects. Acknowledgments. This work was supported in part by the THALES Program of the Ministry of Education of Greece and the European Union in the framework of the project entitled "Integrated understanding of Seismicity, using innovative Methodologies of Fracture mechanics along with Earthquake and non extensive statistical physics - Application to the geodynamic system of the Hellenic Arc. SEISMO FEAR HELLARC". GM and GP wish to acknowledge the partial support of the Greek State Scholarships Foundation (ΙΚΥ).
Are weather models better than gridded observations for precipitation in the mountains? (Invited)
NASA Astrophysics Data System (ADS)
Gutmann, E. D.; Rasmussen, R.; Liu, C.; Ikeda, K.; Clark, M. P.; Brekke, L. D.; Arnold, J.; Raff, D. A.
2013-12-01
Mountain snowpack is a critical storage component in the water cycle, and it provides drinking water for tens of millions of people in the Western US alone. This water store is susceptible to climate change both because warming temperatures are likely to lead to earlier melt and a temporal shift of the hydrograph, and because changing atmospheric conditions are likely to change the precipitation patterns that produce the snowpack. Current measurements of snowfall in complex terrain are limited in number due in part to the logistics of installing equipment in complex terrain. We show that this limitation leads to statistical artifacts in gridded observations of current climate including errors in precipitation season totals of a factor of two or more, increases in wet day fraction, and decreases in storm intensity. In contrast, a high-resolution numerical weather model (WRF) is able to reproduce observed precipitation patterns, leading to confidence in its predictions for areas without measurements and new observations support this. Running WRF for a future climate scenario shows substantial changes in the spatial patterns of precipitation in the mountains related to the physics of hydrometeor production and detrainment that are not captured by statistical downscaling products. The stationarity in statistical downscaling products is likely to lead to important errors in our estimation of future precipitation in complex terrain.
Modeling Cross-Situational Word–Referent Learning: Prior Questions
Yu, Chen; Smith, Linda B.
2013-01-01
Both adults and young children possess powerful statistical computation capabilities—they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of associative learning. This article describes a series of simulation studies and analyses designed to understand the different learning mechanisms posited by the 2 classes of models and their relation to each other. Variants of a hypothesis-testing model and a simple or dumb associative mechanism were examined under different specifications of information selection, computation, and decision. Critically, these 3 components of the models interact in complex ways. The models illustrate a fundamental tradeoff between amount of data input and powerful computations: With the selection of more information, dumb associative models can mimic the powerful learning that is accomplished by hypothesis-testing models with fewer data. However, because of the interactions among the component parts of the models, the associative model can mimic various hypothesis-testing models, producing the same learning patterns but through different internal components. The simulations argue for the importance of a compositional approach to human statistical learning: the experimental decomposition of the processes that contribute to statistical learning in human learners and models with the internal components that can be evaluated independently and together. PMID:22229490
Applying the multivariate time-rescaling theorem to neural population models
Gerhard, Felipe; Haslinger, Robert; Pipa, Gordon
2011-01-01
Statistical models of neural activity are integral to modern neuroscience. Recently, interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based upon the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models which neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem, and provide a practical step-by-step procedure for applying it towards testing the sufficiency of neural population models. Using several simple analytically tractable models and also more complex simulated and real data sets, we demonstrate that important features of the population activity can only be detected using the multivariate extension of the test. PMID:21395436
NASA Astrophysics Data System (ADS)
Guadagnini, A.; Riva, M.; Dell'Oca, A.
2017-12-01
We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Extended q -Gaussian and q -exponential distributions from gamma random variables
NASA Astrophysics Data System (ADS)
Budini, Adrián A.
2015-05-01
The family of q -Gaussian and q -exponential probability densities fit the statistical behavior of diverse complex self-similar nonequilibrium systems. These distributions, independently of the underlying dynamics, can rigorously be obtained by maximizing Tsallis "nonextensive" entropy under appropriate constraints, as well as from superstatistical models. In this paper we provide an alternative and complementary scheme for deriving these objects. We show that q -Gaussian and q -exponential random variables can always be expressed as a function of two statistically independent gamma random variables with the same scale parameter. Their shape index determines the complexity q parameter. This result also allows us to define an extended family of asymmetric q -Gaussian and modified q -exponential densities, which reduce to the standard ones when the shape parameters are the same. Furthermore, we demonstrate that a simple change of variables always allows relating any of these distributions with a beta stochastic variable. The extended distributions are applied in the statistical description of different complex dynamics such as log-return signals in financial markets and motion of point defects in a fluid flow.
Towards physical principles of biological evolution
NASA Astrophysics Data System (ADS)
Katsnelson, Mikhail I.; Wolf, Yuri I.; Koonin, Eugene V.
2018-03-01
Biological systems reach organizational complexity that far exceeds the complexity of any known inanimate objects. Biological entities undoubtedly obey the laws of quantum physics and statistical mechanics. However, is modern physics sufficient to adequately describe, model and explain the evolution of biological complexity? Detailed parallels have been drawn between statistical thermodynamics and the population-genetic theory of biological evolution. Based on these parallels, we outline new perspectives on biological innovation and major transitions in evolution, and introduce a biological equivalent of thermodynamic potential that reflects the innovation propensity of an evolving population. Deep analogies have been suggested to also exist between the properties of biological entities and processes, and those of frustrated states in physics, such as glasses. Such systems are characterized by frustration whereby local state with minimal free energy conflict with the global minimum, resulting in ‘emergent phenomena’. We extend such analogies by examining frustration-type phenomena, such as conflicts between different levels of selection, in biological evolution. These frustration effects appear to drive the evolution of biological complexity. We further address evolution in multidimensional fitness landscapes from the point of view of percolation theory and suggest that percolation at level above the critical threshold dictates the tree-like evolution of complex organisms. Taken together, these multiple connections between fundamental processes in physics and biology imply that construction of a meaningful physical theory of biological evolution might not be a futile effort. However, it is unrealistic to expect that such a theory can be created in one scoop; if it ever comes to being, this can only happen through integration of multiple physical models of evolutionary processes. Furthermore, the existing framework of theoretical physics is unlikely to suffice for adequate modeling of the biological level of complexity, and new developments within physics itself are likely to be required.
Habitat Complexity in Aquatic Microcosms Affects Processes Driven by Detritivores
Flores, Lorea; Bailey, R. A.; Elosegi, Arturo; Larrañaga, Aitor; Reiss, Julia
2016-01-01
Habitat complexity can influence predation rates (e.g. by providing refuge) but other ecosystem processes and species interactions might also be modulated by the properties of habitat structure. Here, we focussed on how complexity of artificial habitat (plastic plants), in microcosms, influenced short-term processes driven by three aquatic detritivores. The effects of habitat complexity on leaf decomposition, production of fine organic matter and pH levels were explored by measuring complexity in three ways: 1. as the presence vs. absence of habitat structure; 2. as the amount of structure (3 or 4.5 g of plastic plants); and 3. as the spatial configuration of structures (measured as fractal dimension). The experiment also addressed potential interactions among the consumers by running all possible species combinations. In the experimental microcosms, habitat complexity influenced how species performed, especially when comparing structure present vs. structure absent. Treatments with structure showed higher fine particulate matter production and lower pH compared to treatments without structures and this was probably due to higher digestion and respiration when structures were present. When we explored the effects of the different complexity levels, we found that the amount of structure added explained more than the fractal dimension of the structures. We give a detailed overview of the experimental design, statistical models and R codes, because our statistical analysis can be applied to other study systems (and disciplines such as restoration ecology). We further make suggestions of how to optimise statistical power when artificially assembling, and analysing, ‘habitat complexity’ by not confounding complexity with the amount of structure added. In summary, this study highlights the importance of habitat complexity for energy flow and the maintenance of ecosystem processes in aquatic ecosystems. PMID:27802267
Parameter Estimation and Model Validation of Nonlinear Dynamical Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abarbanel, Henry; Gill, Philip
In the performance period of this work under a DOE contract, the co-PIs, Philip Gill and Henry Abarbanel, developed new methods for statistical data assimilation for problems of DOE interest, including geophysical and biological problems. This included numerical optimization algorithms for variational principles, new parallel processing Monte Carlo routines for performing the path integrals of statistical data assimilation. These results have been summarized in the monograph: “Predicting the Future: Completing Models of Observed Complex Systems” by Henry Abarbanel, published by Spring-Verlag in June 2013. Additional results and details have appeared in the peer reviewed literature.
A Guide to the Literature on Learning Graphical Models
NASA Technical Reports Server (NTRS)
Buntine, Wray L.; Friedland, Peter (Technical Monitor)
1994-01-01
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and more generally, learning probabilistic graphical models. Because many problems in artificial intelligence, statistics and neural networks can be represented as a probabilistic graphical model, this area provides a unifying perspective on learning. This paper organizes the research in this area along methodological lines of increasing complexity.
ERIC Educational Resources Information Center
Hsieh, Chueh-An; Maier, Kimberly S.
2009-01-01
The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…
NASA Astrophysics Data System (ADS)
Donges, Jonathan; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik; Marwan, Norbert; Dijkstra, Henk; Kurths, Jürgen
2016-04-01
We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology. pyunicorn is available online at https://github.com/pik-copan/pyunicorn. Reference: J.F. Donges, J. Heitzig, B. Beronov, M. Wiedermann, J. Runge, Q.-Y. Feng, L. Tupikina, V. Stolbova, R.V. Donner, N. Marwan, H.A. Dijkstra, and J. Kurths, Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package, Chaos 25, 113101 (2015), DOI: 10.1063/1.4934554, Preprint: arxiv.org:1507.01571 [physics.data-an].
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rundle, John B.; Klein, William
We have carried out research to determine the dynamics of failure in complex geomaterials, specifically focusing on the role of defects, damage and asperities in the catastrophic failure processes (now popularly termed “Black Swan events”). We have examined fracture branching and flow processes using models for invasion percolation, focusing particularly on the dynamics of bursts in the branching process. We have achieved a fundamental understanding of the dynamics of nucleation in complex geomaterials, specifically in the presence of inhomogeneous structures.
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-01-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-09-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes
NASA Technical Reports Server (NTRS)
Williams Colin P.
1999-01-01
Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.
Information properties of morphologically complex words modulate brain activity during word reading
Hultén, Annika; Lehtonen, Minna; Lagus, Krista; Salmelin, Riitta
2018-01-01
Abstract Neuroimaging studies of the reading process point to functionally distinct stages in word recognition. Yet, current understanding of the operations linked to those various stages is mainly descriptive in nature. Approaches developed in the field of computational linguistics may offer a more quantitative approach for understanding brain dynamics. Our aim was to evaluate whether a statistical model of morphology, with well‐defined computational principles, can capture the neural dynamics of reading, using the concept of surprisal from information theory as the common measure. The Morfessor model, created for unsupervised discovery of morphemes, is based on the minimum description length principle and attempts to find optimal units of representation for complex words. In a word recognition task, we correlated brain responses to word surprisal values derived from Morfessor and from other psycholinguistic variables that have been linked with various levels of linguistic abstraction. The magnetoencephalography data analysis focused on spatially, temporally and functionally distinct components of cortical activation observed in reading tasks. The early occipital and occipito‐temporal responses were correlated with parameters relating to visual complexity and orthographic properties, whereas the later bilateral superior temporal activation was correlated with whole‐word based and morphological models. The results show that the word processing costs estimated by the statistical Morfessor model are relevant for brain dynamics of reading during late processing stages. PMID:29524274
The statistical mechanics of complex signaling networks: nerve growth factor signaling
NASA Astrophysics Data System (ADS)
Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.
2004-10-01
The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'
Information properties of morphologically complex words modulate brain activity during word reading.
Hakala, Tero; Hultén, Annika; Lehtonen, Minna; Lagus, Krista; Salmelin, Riitta
2018-06-01
Neuroimaging studies of the reading process point to functionally distinct stages in word recognition. Yet, current understanding of the operations linked to those various stages is mainly descriptive in nature. Approaches developed in the field of computational linguistics may offer a more quantitative approach for understanding brain dynamics. Our aim was to evaluate whether a statistical model of morphology, with well-defined computational principles, can capture the neural dynamics of reading, using the concept of surprisal from information theory as the common measure. The Morfessor model, created for unsupervised discovery of morphemes, is based on the minimum description length principle and attempts to find optimal units of representation for complex words. In a word recognition task, we correlated brain responses to word surprisal values derived from Morfessor and from other psycholinguistic variables that have been linked with various levels of linguistic abstraction. The magnetoencephalography data analysis focused on spatially, temporally and functionally distinct components of cortical activation observed in reading tasks. The early occipital and occipito-temporal responses were correlated with parameters relating to visual complexity and orthographic properties, whereas the later bilateral superior temporal activation was correlated with whole-word based and morphological models. The results show that the word processing costs estimated by the statistical Morfessor model are relevant for brain dynamics of reading during late processing stages. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Impact of gastrectomy procedural complexity on surgical outcomes and hospital comparisons.
Mohanty, Sanjay; Paruch, Jennifer; Bilimoria, Karl Y; Cohen, Mark; Strong, Vivian E; Weber, Sharon M
2015-08-01
Most risk adjustment approaches adjust for patient comorbidities and the primary procedure. However, procedures done at the same time as the index case may increase operative risk and merit inclusion in adjustment models for fair hospital comparisons. Our objectives were to evaluate the impact of surgical complexity on postoperative outcomes and hospital comparisons in gastric cancer surgery. Patients who underwent gastric resection for cancer were identified from a large clinical dataset. Procedure complexity was characterized using secondary procedure CPT codes and work relative value units (RVUs). Regression models were developed to evaluate the association between complexity variables and outcomes. The impact of complexity adjustment on model performance and hospital comparisons was examined. Among 3,467 patients who underwent gastrectomy for adenocarcinoma, 2,171 operations were distal and 1,296 total. A secondary procedure was reported for 33% of distal gastrectomies and 59% of total gastrectomies. Six of 10 secondary procedures were associated with adverse outcomes. For example, patients who underwent a synchronous bowel resection had a higher risk of mortality (odds ratio [OR], 2.14; 95% CI, 1.07-4.29) and reoperation (OR, 2.09; 95% CI, 1.26-3.47). Model performance was slightly better for nearly all outcomes with complexity adjustment (mortality c-statistics: standard model, 0.853; secondary procedure model, 0.858; RVU model, 0.855). Hospital ranking did not change substantially after complexity adjustment. Surgical complexity variables are associated with adverse outcomes in gastrectomy, but complexity adjustment does not affect hospital rankings appreciably. Copyright © 2015 Elsevier Inc. All rights reserved.
Novel pseudo-random number generator based on quantum random walks.
Yang, Yu-Guang; Zhao, Qian-Qian
2016-02-04
In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation.
Novel pseudo-random number generator based on quantum random walks
Yang, Yu-Guang; Zhao, Qian-Qian
2016-01-01
In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation. PMID:26842402
NASA Astrophysics Data System (ADS)
Lagerwall, Gareth; Kiker, Gregory; Muñoz-Carpena, Rafael; Wang, Naiming
2017-01-01
The coupled regional simulation model, and the transport and reaction simulation engine were recently adapted to simulate ecology, specifically Typha domingensis (Cattail) dynamics in the Everglades. While Cattail is a native Everglades species, it has become invasive over the years due to an altered habitat over the last few decades, taking over historically Cladium jamaicense (Sawgrass) areas. Two models of different levels of algorithmic complexity were developed in previous studies, and are used here to determine the impact of various management decisions on the average Cattail density within Water Conservation Area 2A in the Everglades. A Global Uncertainty and Sensitivity Analysis was conducted to test the importance of these management scenarios, as well as the effectiveness of using zonal statistics. Management scenarios included high, medium and low initial water depths, soil phosphorus concentrations, initial Cattail and Sawgrass densities, as well as annually alternating water depths and soil phosphorus concentrations, and a steadily decreasing soil phosphorus concentration. Analysis suggests that zonal statistics are good indicators of regional trends, and that high soil phosphorus concentration is a pre-requisite for expansive Cattail growth. It is a complex task to manage Cattail expansion in this region, requiring the close management and monitoring of water depth and soil phosphorus concentration, and possibly other factors not considered in the model complexities. However, this modeling framework with user-definable complexities and management scenarios, can be considered a useful tool in analyzing many more alternatives, which could be used to aid management decisions in the future.
Lagerwall, Gareth; Kiker, Gregory; Muñoz-Carpena, Rafael; Wang, Naiming
2017-01-01
The coupled regional simulation model, and the transport and reaction simulation engine were recently adapted to simulate ecology, specifically Typha domingensis (Cattail) dynamics in the Everglades. While Cattail is a native Everglades species, it has become invasive over the years due to an altered habitat over the last few decades, taking over historically Cladium jamaicense (Sawgrass) areas. Two models of different levels of algorithmic complexity were developed in previous studies, and are used here to determine the impact of various management decisions on the average Cattail density within Water Conservation Area 2A in the Everglades. A Global Uncertainty and Sensitivity Analysis was conducted to test the importance of these management scenarios, as well as the effectiveness of using zonal statistics. Management scenarios included high, medium and low initial water depths, soil phosphorus concentrations, initial Cattail and Sawgrass densities, as well as annually alternating water depths and soil phosphorus concentrations, and a steadily decreasing soil phosphorus concentration. Analysis suggests that zonal statistics are good indicators of regional trends, and that high soil phosphorus concentration is a pre-requisite for expansive Cattail growth. It is a complex task to manage Cattail expansion in this region, requiring the close management and monitoring of water depth and soil phosphorus concentration, and possibly other factors not considered in the model complexities. However, this modeling framework with user-definable complexities and management scenarios, can be considered a useful tool in analyzing many more alternatives, which could be used to aid management decisions in the future.
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
Raymond L. Czaplewski
1989-01-01
It is difficult to design systems for national and global resource inventory and analysis that efficiently satisfy changing, and increasingly complex objectives. It is proposed that individual inventory, monitoring, modeling, and remote sensing systems be specialized to achieve portions of the objectives. These separate systems can be statistically linked to accomplish...
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation
2015-09-30
the complexity of the model: (from simplest to most complex) Kalman filter , Markov chain Monte-Carlo (MCMC) and ABC. Many of these methods have been...using SMMs fitted using Kalman filters . Therefore, using the DTAG data, we can estimate the distributions associated with 2D horizontal displacement...speed (a key problem in the previous Kalman filter implementation). This new approach also allows the animal’s horizontal movement direction to differ
Bojan, Mirela; Gerelli, Sébastien; Gioanni, Simone; Pouard, Philippe; Vouhé, Pascal
2011-09-01
The Aristotle Comprehensive Complexity (ACC) and the Risk Adjustment in Congenital Heart Surgery (RACHS-1) scores have been proposed for complexity adjustment in the analysis of outcome after congenital heart surgery. Previous studies found RACHS-1 to be a better predictor of outcome than the Aristotle Basic Complexity score. We compared the ability to predict operative mortality and morbidity between ACC, the latest update of the Aristotle method and RACHS-1. Morbidity was assessed by length of intensive care unit stay. We retrospectively enrolled patients undergoing congenital heart surgery. We modeled each score as a continuous variable, mortality as a binary variable, and length of stay as a censored variable. We compared performance between mortality and morbidity models using likelihood ratio tests for nested models and paired concordance statistics. Among all 1,384 patients enrolled, 30-day mortality rate was 3.5% and median length of intensive care unit stay was 3 days. Both scores strongly related to mortality, but ACC made better prediction than RACHS-1; c-indexes 0.87 (0.84, 0.91) vs 0.75 (0.65, 0.82). Both scores related to overall length of stay only during the first postoperative week, but ACC made better predictions than RACHS-1; U statistic=0.22, p<0.001. No significant difference was noted after adjusting RACHS-1 models on age, prematurity, and major extracardiac abnormalities. The ACC was a better predictor of operative mortality and length of intensive care unit stay than RACHS-1. In order to achieve similar performance, regression models including RACHS-1 need to be further adjusted on age, prematurity, and major extracardiac abnormalities. Copyright © 2011 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Puch-Solis, Roberto; Clayton, Tim
2014-07-01
The high sensitivity of the technology for producing profiles means that it has become routine to produce profiles from relatively small quantities of DNA. The profiles obtained from low template DNA (LTDNA) are affected by several phenomena which must be taken into consideration when interpreting and evaluating this evidence. Furthermore, many of the same phenomena affect profiles from higher amounts of DNA (e.g. where complex mixtures has been revealed). In this article we present a statistical model, which forms the basis of software DNA LiRa, and that is able to calculate likelihood ratios where one to four donors are postulated and for any number of replicates. The model can take into account dropin and allelic dropout for different contributors, template degradation and uncertain allele designations. In this statistical model unknown parameters are treated following the Empirical Bayesian paradigm. The performance of LiRa is tested using examples and the outputs are compared with those generated using two other statistical software packages likeLTD and LRmix. The concept of ban efficiency is introduced as a measure for assessing model sensitivity. Copyright © 2014. Published by Elsevier Ireland Ltd.
Pérez-Garrido, Alfonso; Morales Helguera, Aliuska; Abellán Guillén, Adela; Cordeiro, M Natália D S; Garrido Escudero, Amalio
2009-01-15
This paper reports a QSAR study for predicting the complexation of a large and heterogeneous variety of substances (233 organic compounds) with beta-cyclodextrins (beta-CDs). Several different theoretical molecular descriptors, calculated solely from the molecular structure of the compounds under investigation, and an efficient variable selection procedure, like the Genetic Algorithm, led to models with satisfactory global accuracy and predictivity. But the best-final QSAR model is based on Topological descriptors meanwhile offering a reasonable interpretation. This QSAR model was able to explain ca. 84% of the variance in the experimental activity, and displayed very good internal cross-validation statistics and predictivity on external data. It shows that the driving forces for CD complexation are mainly hydrophobic and steric (van der Waals) interactions. Thus, the results of our study provide a valuable tool for future screening and priority testing of beta-CDs guest molecules.
Animal models of cerebral ischemia
NASA Astrophysics Data System (ADS)
Khodanovich, M. Yu.; Kisel, A. A.
2015-11-01
Cerebral ischemia remains one of the most frequent causes of death and disability worldwide. Animal models are necessary to understand complex molecular mechanisms of brain damage as well as for the development of new therapies for stroke. This review considers a certain range of animal models of cerebral ischemia, including several types of focal and global ischemia. Since animal models vary in specificity for the human disease which they reproduce, the complexity of surgery, infarct size, reliability of reproduction for statistical analysis, and adequate models need to be chosen according to the aim of a study. The reproduction of a particular animal model needs to be evaluated using appropriate tools, including the behavioral assessment of injury and non-invasive and post-mortem control of brain damage. These problems also have been summarized in the review.
A combinatorial framework to quantify peak/pit asymmetries in complex dynamics.
Hasson, Uri; Iacovacci, Jacopo; Davis, Ben; Flanagan, Ryan; Tagliazucchi, Enzo; Laufs, Helmut; Lacasa, Lucas
2018-02-23
We explore a combinatorial framework which efficiently quantifies the asymmetries between minima and maxima in local fluctuations of time series. We first showcase its performance by applying it to a battery of synthetic cases. We find rigorous results on some canonical dynamical models (stochastic processes with and without correlations, chaotic processes) complemented by extensive numerical simulations for a range of processes which indicate that the methodology correctly distinguishes different complex dynamics and outperforms state of the art metrics in several cases. Subsequently, we apply this methodology to real-world problems emerging across several disciplines including cases in neurobiology, finance and climate science. We conclude that differences between the statistics of local maxima and local minima in time series are highly informative of the complex underlying dynamics and a graph-theoretic extraction procedure allows to use these features for statistical learning purposes.
NASA Astrophysics Data System (ADS)
Rodríguez, Nancy
2015-03-01
The use of mathematical tools has long proved to be useful in gaining understanding of complex systems in physics [1]. Recently, many researchers have realized that there is an analogy between emerging phenomena in complex social systems and complex physical or biological systems [4,5,12]. This realization has particularly benefited the modeling and understanding of crime, a ubiquitous phenomena that is far from being understood. In fact, when one is interested in the bulk behavior of patterns that emerge from small and seemingly unrelated interactions as well as decisions that occur at the individual level, the mathematical tools that have been developed in statistical physics, game theory, network theory, dynamical systems, and partial differential equations can be useful in shedding light into the dynamics of these patterns [2-4,6,12].
Experimental econophysics: Complexity, self-organization, and emergent properties
NASA Astrophysics Data System (ADS)
Huang, J. P.
2015-03-01
Experimental econophysics is concerned with statistical physics of humans in the laboratory, and it is based on controlled human experiments developed by physicists to study some problems related to economics or finance. It relies on controlled human experiments in the laboratory together with agent-based modeling (for computer simulations and/or analytical theory), with an attempt to reveal the general cause-effect relationship between specific conditions and emergent properties of real economic/financial markets (a kind of complex adaptive systems). Here I review the latest progress in the field, namely, stylized facts, herd behavior, contrarian behavior, spontaneous cooperation, partial information, and risk management. Also, I highlight the connections between such progress and other topics of traditional statistical physics. The main theme of the review is to show diverse emergent properties of the laboratory markets, originating from self-organization due to the nonlinear interactions among heterogeneous humans or agents (complexity).
Reduced-Order Modeling for Optimization and Control of Complex Flows
2010-11-30
Statistics Colloquium, Auburn, AL, (January 2009). 16. University of Pittsburgh, Mathematics Colloquium, Pittsburgh, PA, (February 2009). 17. Goethe ...Center for Scientific Computing, Goethe University Frankfurt am Main, Ger- many, (June 2009). 18. Air Force Institute of Technology, Wright-Patterson
NASA Astrophysics Data System (ADS)
Tariq, Imran; Humbert-Vidan, Laia; Chen, Tao; South, Christopher P.; Ezhil, Veni; Kirkby, Norman F.; Jena, Rajesh; Nisbet, Andrew
2015-05-01
This paper reports a modelling study of tumour volume dynamics in response to stereotactic ablative radiotherapy (SABR). The main objective was to develop a model that is adequate to describe tumour volume change measured during SABR, and at the same time is not excessively complex as lacking support from clinical data. To this end, various modelling options were explored, and a rigorous statistical method, the Akaike information criterion, was used to help determine a trade-off between model accuracy and complexity. The models were calibrated to the data from 11 non-small cell lung cancer patients treated with SABR. The results showed that it is feasible to model the tumour volume dynamics during SABR, opening up the potential for using such models in a clinical environment in the future.
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling
NASA Technical Reports Server (NTRS)
He, Yuning; Davies, Misty Dawn
2013-01-01
Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
DnaSAM: Software to perform neutrality testing for large datasets with complex null models.
Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B
2010-05-01
Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.
Walking through the statistical black boxes of plant breeding.
Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin
2016-10-01
The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro; Tanaka-Ishii, Kumiko
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks. PMID:29287076
Maxwell's color statistics: from reduction of visible errors to reduction to invisible molecules.
Cat, Jordi
2014-12-01
This paper presents a cross-disciplinary and multi-disciplinary account of Maxwell's introduction of statistical models of molecules for the composition of gases. The account focuses on Maxwell's deployment of statistical models of data in his contemporaneous color researches as established in Cambridge mathematical physics, especially by Maxwell's seniors and mentors. The paper also argues that the cross-disciplinary, or cross-domain, transfer of resources from the natural and social sciences took place in both directions and relied on the complex intra-disciplinary, or intra-domain, dynamics of Maxwell's researches in natural sciences, in color theory, physical astronomy, electromagnetism and dynamical theory of gases, as well as involving a variety of types of communicating and mediating media, from material objects to concepts, techniques and institutions.
Multiscale multifractal DCCA and complexity behaviors of return intervals for Potts price model
NASA Astrophysics Data System (ADS)
Wang, Jie; Wang, Jun; Stanley, H. Eugene
2018-02-01
To investigate the characteristics of extreme events in financial markets and the corresponding return intervals among these events, we use a Potts dynamic system to construct a random financial time series model of the attitudes of market traders. We use multiscale multifractal detrended cross-correlation analysis (MM-DCCA) and Lempel-Ziv complexity (LZC) perform numerical research of the return intervals for two significant China's stock market indices and for the proposed model. The new MM-DCCA method is based on the Hurst surface and provides more interpretable cross-correlations of the dynamic mechanism between different return interval series. We scale the LZC method with different exponents to illustrate the complexity of return intervals in different scales. Empirical studies indicate that the proposed return intervals from the Potts system and the real stock market indices hold similar statistical properties.
Rinaldi, Antonio
2011-04-01
Traditional fiber bundles models (FBMs) have been an effective tool to understand brittle heterogeneous systems. However, fiber bundles in modern nano- and bioapplications demand a new generation of FBM capturing more complex deformation processes in addition to damage. In the context of loose bundle systems and with reference to time-independent plasticity and soft biomaterials, we formulate a generalized statistical model for ductile fracture and nonlinear elastic problems capable of handling more simultaneous deformation mechanisms by means of two order parameters (as opposed to one). As the first rational FBM for coupled damage problems, it may be the cornerstone for advanced statistical models of heterogeneous systems in nanoscience and materials design, especially to explore hierarchical and bio-inspired concepts in the arena of nanobiotechnology. Applicative examples are provided for illustrative purposes at last, discussing issues in inverse analysis (i.e., nonlinear elastic polymer fiber and ductile Cu submicron bars arrays) and direct design (i.e., strength prediction).
On entropy, financial markets and minority games
NASA Astrophysics Data System (ADS)
Zapart, Christopher A.
2009-04-01
The paper builds upon an earlier statistical analysis of financial time series with Shannon information entropy, published in [L. Molgedey, W. Ebeling, Local order, entropy and predictability of financial time series, European Physical Journal B-Condensed Matter and Complex Systems 15/4 (2000) 733-737]. A novel generic procedure is proposed for making multistep-ahead predictions of time series by building a statistical model of entropy. The approach is first demonstrated on the chaotic Mackey-Glass time series and later applied to Japanese Yen/US dollar intraday currency data. The paper also reinterprets Minority Games [E. Moro, The minority game: An introductory guide, Advances in Condensed Matter and Statistical Physics (2004)] within the context of physical entropy, and uses models derived from minority game theory as a tool for measuring the entropy of a model in response to time series. This entropy conditional upon a model is subsequently used in place of information-theoretic entropy in the proposed multistep prediction algorithm.
An Easy Tool to Predict Survival in Patients Receiving Radiation Therapy for Painful Bone Metastases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Westhoff, Paulien G., E-mail: p.g.westhoff@umcutrecht.nl; Graeff, Alexander de; Monninkhof, Evelyn M.
2014-11-15
Purpose: Patients with bone metastases have a widely varying survival. A reliable estimation of survival is needed for appropriate treatment strategies. Our goal was to assess the value of simple prognostic factors, namely, patient and tumor characteristics, Karnofsky performance status (KPS), and patient-reported scores of pain and quality of life, to predict survival in patients with painful bone metastases. Methods and Materials: In the Dutch Bone Metastasis Study, 1157 patients were treated with radiation therapy for painful bone metastases. At randomization, physicians determined the KPS; patients rated general health on a visual analogue scale (VAS-gh), valuation of life on amore » verbal rating scale (VRS-vl) and pain intensity. To assess the predictive value of the variables, we used multivariate Cox proportional hazard analyses and C-statistics for discriminative value. Of the final model, calibration was assessed. External validation was performed on a dataset of 934 patients who were treated with radiation therapy for vertebral metastases. Results: Patients had mainly breast (39%), prostate (23%), or lung cancer (25%). After a maximum of 142 weeks' follow-up, 74% of patients had died. The best predictive model included sex, primary tumor, visceral metastases, KPS, VAS-gh, and VRS-vl (C-statistic = 0.72, 95% CI = 0.70-0.74). A reduced model, with only KPS and primary tumor, showed comparable discriminative capacity (C-statistic = 0.71, 95% CI = 0.69-0.72). External validation showed a C-statistic of 0.72 (95% CI = 0.70-0.73). Calibration of the derivation and the validation dataset showed underestimation of survival. Conclusion: In predicting survival in patients with painful bone metastases, KPS combined with primary tumor was comparable to a more complex model. Considering the amount of variables in complex models and the additional burden on patients, the simple model is preferred for daily use. In addition, a risk table for survival is provided.« less
On the Origins of Suboptimality in Human Probabilistic Inference
Acerbi, Luigi; Vijayakumar, Sethu; Wolpert, Daniel M.
2014-01-01
Humans have been shown to combine noisy sensory information with previous experience (priors), in qualitative and sometimes quantitative agreement with the statistically-optimal predictions of Bayesian integration. However, when the prior distribution becomes more complex than a simple Gaussian, such as skewed or bimodal, training takes much longer and performance appears suboptimal. It is unclear whether such suboptimality arises from an imprecise internal representation of the complex prior, or from additional constraints in performing probabilistic computations on complex distributions, even when accurately represented. Here we probe the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior. Subjects had to estimate the location of a target given a noisy cue and a visual representation of the prior probability density over locations, which changed on each trial. Different classes of priors were examined (Gaussian, unimodal, bimodal). Subjects' performance was in qualitative agreement with the predictions of Bayesian Decision Theory although generally suboptimal. The degree of suboptimality was modulated by statistical features of the priors but was largely independent of the class of the prior and level of noise in the cue, suggesting that suboptimality in dealing with complex statistical features, such as bimodality, may be due to a problem of acquiring the priors rather than computing with them. We performed a factorial model comparison across a large set of Bayesian observer models to identify additional sources of noise and suboptimality. Our analysis rejects several models of stochastic behavior, including probability matching and sample-averaging strategies. Instead we show that subjects' response variability was mainly driven by a combination of a noisy estimation of the parameters of the priors, and by variability in the decision process, which we represent as a noisy or stochastic posterior. PMID:24945142
Experiment Design for Complex VTOL Aircraft with Distributed Propulsion and Tilt Wing
NASA Technical Reports Server (NTRS)
Murphy, Patrick C.; Landman, Drew
2015-01-01
Selected experimental results from a wind tunnel study of a subscale VTOL concept with distributed propulsion and tilt lifting surfaces are presented. The vehicle complexity and automated test facility were ideal for use with a randomized designed experiment. Design of Experiments and Response Surface Methods were invoked to produce run efficient, statistically rigorous regression models with minimized prediction error. Static tests were conducted at the NASA Langley 12-Foot Low-Speed Tunnel to model all six aerodynamic coefficients over a large flight envelope. This work supports investigations at NASA Langley in developing advanced configurations, simulations, and advanced control systems.
Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth
2015-10-01
Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master's degree-level statistics education. The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.
NASA Astrophysics Data System (ADS)
Xu, Kaixuan; Wang, Jun
2017-02-01
In this paper, recently introduced permutation entropy and sample entropy are further developed to the fractional cases, weighted fractional permutation entropy (WFPE) and fractional sample entropy (FSE). The fractional order generalization of information entropy is utilized in the above two complexity approaches, to detect the statistical characteristics of fractional order information in complex systems. The effectiveness analysis of proposed methods on the synthetic data and the real-world data reveals that tuning the fractional order allows a high sensitivity and more accurate characterization to the signal evolution, which is useful in describing the dynamics of complex systems. Moreover, the numerical research on nonlinear complexity behaviors is compared between the returns series of Potts financial model and the actual stock markets. And the empirical results confirm the feasibility of the proposed model.
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
Nariai, N; Kim, S; Imoto, S; Miyano, S
2004-01-01
We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.
Study of pre-seismic kHz EM emissions by means of complex systems
NASA Astrophysics Data System (ADS)
Balasis, Georgios; Papadimitriou, Constantinos; Eftaxias, Konstantinos
2010-05-01
The field of study of complex systems holds that the dynamics of complex systems are founded on universal principles that may used to describe disparate problems ranging from particle physics to economies of societies. A corollary is that transferring ideas and results from investigators in hitherto disparate areas will cross-fertilize and lead to important new results. It is well-known that the Boltzmann-Gibbs statistical mechanics works best in dealing with systems composed of either independent subsystems or interacting via short-range forces, and whose subsystems can access all the available phase space. For systems exhibiting long-range correlations, memory, or fractal properties, non-extensive Tsallis statistical mechanics becomes the most appropriate mathematical framework. As it was mentioned a central property of the magnetic storm, solar flare, and earthquake preparation process is the possible occurrence of coherent large-scale collective with a very rich structure, resulting from the repeated nonlinear interactions among collective with a very rich structure, resulting from the repeated nonlinear interactions among its constituents. Consequently, the non-extensive statistical mechanics is an appropriate regime to investigate universality, if any, in magnetic storm, solar flare, earthquake and pre-failure EM emission occurrence. A model for earthquake dynamics coming from a non-extensive Tsallis formulation, starting from first principles, has been recently introduced. This approach leads to a Gutenberg-Richter type law for the magnitude distribution of earthquakes which provides an excellent fit to seismicities generated in various large geographic areas usually identified as "seismic regions". We examine whether the Gutenberg-Richter law corresponding to a non-extensive Tsallis statistics is able to describe the distribution of amplitude of earthquakes, pre-seismic kHz EM emissions (electromagnetic earthquakes), solar flares, and magnetic storms. The analysis shows that the introduced non-extensive model provides an excellent fit to the experimental data, incorporating the characteristics of universality by means of non-extensive statistics into the extreme events under study.
TREAT (TREe-based Association Test)
TREAT is an R package for detecting complex joint effects in case-control studies. The test statistic is derived from a tree-structure model by recursive partitioning the data. Ultra-fast algorithm is designed to evaluate the significance of association between candidate gene and disease outcome
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
Model Uncertainty Quantification Methods In Data Assimilation
NASA Astrophysics Data System (ADS)
Pathiraja, S. D.; Marshall, L. A.; Sharma, A.; Moradkhani, H.
2017-12-01
Data Assimilation involves utilising observations to improve model predictions in a seamless and statistically optimal fashion. Its applications are wide-ranging; from improving weather forecasts to tracking targets such as in the Apollo 11 mission. The use of Data Assimilation methods in high dimensional complex geophysical systems is an active area of research, where there exists many opportunities to enhance existing methodologies. One of the central challenges is in model uncertainty quantification; the outcome of any Data Assimilation study is strongly dependent on the uncertainties assigned to both observations and models. I focus on developing improved model uncertainty quantification methods that are applicable to challenging real world scenarios. These include developing methods for cases where the system states are only partially observed, where there is little prior knowledge of the model errors, and where the model error statistics are likely to be highly non-Gaussian.
A Three-Step Approach To Model Tree Mortality in the State of Georgia
Qingmin Meng; Chris J. Cieszewski; Roger C. Lowe; Michal Zasada
2005-01-01
Tree mortality is one of the most complex phenomena of forest growth and yield. Many types of factors affect tree mortality, which is considered difficult to predict. This study presents a new systematic approach to simulate tree mortality based on the integration of statistical models and geographical information systems. This method begins with variable preselection...
Concepts and their dynamics: a quantum-theoretic modeling of human thought.
Aerts, Diederik; Gabora, Liane; Sozzo, Sandro
2013-10-01
We analyze different aspects of our quantum modeling approach of human concepts and, more specifically, focus on the quantum effects of contextuality, interference, entanglement, and emergence, illustrating how each of them makes its appearance in specific situations of the dynamics of human concepts and their combinations. We point out the relation of our approach, which is based on an ontology of a concept as an entity in a state changing under influence of a context, with the main traditional concept theories, that is, prototype theory, exemplar theory, and theory theory. We ponder about the question why quantum theory performs so well in its modeling of human concepts, and we shed light on this question by analyzing the role of complex amplitudes, showing how they allow to describe interference in the statistics of measurement outcomes, while in the traditional theories statistics of outcomes originates in classical probability weights, without the possibility of interference. The relevance of complex numbers, the appearance of entanglement, and the role of Fock space in explaining contextual emergence, all as unique features of the quantum modeling, are explicitly revealed in this article by analyzing human concepts and their dynamics. © 2013 Cognitive Science Society, Inc.
Finding the Root Causes of Statistical Inconsistency in Community Earth System Model Output
NASA Astrophysics Data System (ADS)
Milroy, D.; Hammerling, D.; Baker, A. H.
2017-12-01
Baker et al (2015) developed the Community Earth System Model Ensemble Consistency Test (CESM-ECT) to provide a metric for software quality assurance by determining statistical consistency between an ensemble of CESM outputs and new test runs. The test has proved useful for detecting statistical difference caused by compiler bugs and errors in physical modules. However, detection is only the necessary first step in finding the causes of statistical difference. The CESM is a vastly complex model comprised of millions of lines of code which is developed and maintained by a large community of software engineers and scientists. Any root cause analysis is correspondingly challenging. We propose a new capability for CESM-ECT: identifying the sections of code that cause statistical distinguishability. The first step is to discover CESM variables that cause CESM-ECT to classify new runs as statistically distinct, which we achieve via Randomized Logistic Regression. Next we use a tool developed to identify CESM components that define or compute the variables found in the first step. Finally, we employ the application Kernel GENerator (KGEN) created in Kim et al (2016) to detect fine-grained floating point differences. We demonstrate an example of the procedure and advance a plan to automate this process in our future work.
A canonical neural mechanism for behavioral variability
NASA Astrophysics Data System (ADS)
Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David
2017-05-01
The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5-6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these `universal' statistics.
NASA Astrophysics Data System (ADS)
Skitka, J.; Marston, B.; Fox-Kemper, B.
2016-02-01
Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.
NASA Astrophysics Data System (ADS)
Perevertailo, T.; Nedolivko, N.; Prisyazhnyuk, O.; Dolgaya, T.
2015-11-01
The complex structure of the Lower-Cretaceous formation by the example of the reservoir BC101 in Western Ust - Balykh Oil Field (Khanty-Mansiysk Autonomous District) has been studied. Reservoir range relationships have been identified. 3D geologic- mathematical modeling technique considering the heterogeneity and variability of a natural reservoir structure has been suggested. To improve the deposit geological structure integrity methods of mathematical statistics were applied, which, in its turn, made it possible to obtain equal probability models with similar input data and to consider the formation conditions of reservoir rocks and cap rocks.
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining
NASA Astrophysics Data System (ADS)
Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing
2005-10-01
Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wharton, S.; Bulaevskaya, V.; Irons, Z.
The goal of our FY15 project was to explore the use of statistical models and high-resolution atmospheric input data to develop more accurate prediction models for turbine power generation. We modeled power for two operational wind farms in two regions of the country. The first site is a 235 MW wind farm in Northern Oklahoma with 140 GE 1.68 turbines. Our second site is a 38 MW wind farm in the Altamont Pass Region of Northern California with 38 Mitsubishi 1 MW turbines. The farms are very different in topography, climatology, and turbine technology; however, both occupy high wind resourcemore » areas in the U.S. and are representative of typical wind farms found in their respective areas.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rawnsley, K.; Swaby, P.
1996-08-01
It is increasingly acknowledged that in order to understand and forecast the behavior of fracture influenced reservoirs we must attempt to reproduce the fracture system geometry and use this as a basis for fluid flow calculation. This article aims to present a recently developed fracture modelling prototype designed specifically for use in hydrocarbon reservoir environments. The prototype {open_quotes}FRAME{close_quotes} (FRActure Modelling Environment) aims to provide a tool which will allow the generation of realistic 3D fracture systems within a reservoir model, constrained to the known geology of the reservoir by both mechanical and statistical considerations, and which can be used asmore » a basis for fluid flow calculation. Two newly developed modelling techniques are used. The first is an interactive tool which allows complex fault surfaces and their associated deformations to be reproduced. The second is a {open_quotes}genetic{close_quotes} model which grows fracture patterns from seeds using conceptual models of fracture development. The user defines the mechanical input and can retrieve all the statistics of the growing fractures to allow comparison to assumed statistical distributions for the reservoir fractures. Input parameters include growth rate, fracture interaction characteristics, orientation maps and density maps. More traditional statistical stochastic fracture models are also incorporated. FRAME is designed to allow the geologist to input hard or soft data including seismically defined surfaces, well fractures, outcrop models, analogue or numerical mechanical models or geological {open_quotes}feeling{close_quotes}. The geologist is not restricted to {open_quotes}a priori{close_quotes} models of fracture patterns that may not correspond to the data.« less
Magnetorotational dynamo chimeras. The missing link to turbulent accretion disk dynamo models?
NASA Astrophysics Data System (ADS)
Riols, A.; Rincon, F.; Cossu, C.; Lesur, G.; Ogilvie, G. I.; Longaretti, P.-Y.
2017-02-01
In Keplerian accretion disks, turbulence and magnetic fields may be jointly excited through a subcritical dynamo mechanisminvolving magnetorotational instability (MRI). This dynamo may notably contribute to explaining the time-variability of various accreting systems, as high-resolution simulations of MRI dynamo turbulence exhibit statistical self-organization into large-scale cyclic dynamics. However, understanding the physics underlying these statistical states and assessing their exact astrophysical relevance is theoretically challenging. The study of simple periodic nonlinear MRI dynamo solutions has recently proven useful in this respect, and has highlighted the role of turbulent magnetic diffusion in the seeming impossibility of a dynamo at low magnetic Prandtl number (Pm), a common regime in disks. Arguably though, these simple laminar structures may not be fully representative of the complex, statistically self-organized states expected in astrophysical regimes. Here, we aim at closing this seeming discrepancy by reporting the numerical discovery of exactly periodic, yet semi-statistical "chimeral MRI dynamo states" which are the organized outcome of a succession of MRI-unstable, non-axisymmetric dynamical stages of different forms and amplitudes. Interestingly, these states, while reminiscent of the statistical complexity of turbulent simulations, involve the same physical principles as simpler laminar cycles, and their analysis further confirms the theory that subcritical turbulent magnetic diffusion impedes the sustainment of an MRI dynamo at low Pm. Overall, chimera dynamo cycles therefore offer an unprecedented dual physical and statistical perspective on dynamos in rotating shear flows, which may prove useful in devising more accurate, yet intuitive mean-field models of time-dependent turbulent disk dynamos. Movies associated to Fig. 1 are available at http://www.aanda.org
Sulis, William H
2017-10-01
Walter Freeman III pioneered the application of nonlinear dynamical systems theories and methodologies in his work on mesoscopic brain dynamics.Sadly, mainstream psychology and psychiatry still cling to linear correlation based data analysis techniques, which threaten to subvert the process of experimentation and theory building. In order to progress, it is necessary to develop tools capable of managing the stochastic complexity of complex biopsychosocial systems, which includes multilevel feedback relationships, nonlinear interactions, chaotic dynamics and adaptability. In addition, however, these systems exhibit intrinsic randomness, non-Gaussian probability distributions, non-stationarity, contextuality, and non-Kolmogorov probabilities, as well as the absence of mean and/or variance and conditional probabilities. These properties and their implications for statistical analysis are discussed. An alternative approach, the Process Algebra approach, is described. It is a generative model, capable of generating non-Kolmogorov probabilities. It has proven useful in addressing fundamental problems in quantum mechanics and in the modeling of developing psychosocial systems.
NASA Technical Reports Server (NTRS)
Rino, C. L.; Livingston, R. C.; Whitney, H. E.
1976-01-01
This paper presents an analysis of ionospheric scintillation data which shows that the underlying statistical structure of the signal can be accurately modeled by the additive complex Gaussian perturbation predicted by the Born approximation in conjunction with an application of the central limit theorem. By making use of this fact, it is possible to estimate the in-phase, phase quadrature, and cophased scattered power by curve fitting to measured intensity histograms. By using this procedure, it is found that typically more than 80% of the scattered power is in phase quadrature with the undeviated signal component. Thus, the signal is modeled by a Gaussian, but highly non-Rician process. From simultaneous UHF and VHF data, only a weak dependence of this statistical structure on changes in the Fresnel radius is deduced. The signal variance is found to have a nonquadratic wavelength dependence. It is hypothesized that this latter effect is a subtle manifestation of locally homogeneous irregularity structures, a mathematical model proposed by Kolmogorov (1941) in his early studies of incompressible fluid turbulence.
Statistics and classification of the microwave zebra patterns associated with solar flares
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, Baolin; Tan, Chengming; Zhang, Yin
2014-01-10
The microwave zebra pattern (ZP) is the most interesting, intriguing, and complex spectral structure frequently observed in solar flares. A comprehensive statistical study will certainly help us to understand the formation mechanism, which is not exactly clear now. This work presents a comprehensive statistical analysis of a big sample with 202 ZP events collected from observations at the Chinese Solar Broadband Radio Spectrometer at Huairou and the Ondŕejov Radiospectrograph in the Czech Republic at frequencies of 1.00-7.60 GHz from 2000 to 2013. After investigating the parameter properties of ZPs, such as the occurrence in flare phase, frequency range, polarization degree,more » duration, etc., we find that the variation of zebra stripe frequency separation with respect to frequency is the best indicator for a physical classification of ZPs. Microwave ZPs can be classified into three types: equidistant ZPs, variable-distant ZPs, and growing-distant ZPs, possibly corresponding to mechanisms of the Bernstein wave model, whistler wave model, and double plasma resonance model, respectively. This statistical classification may help us to clarify the controversies between the existing various theoretical models and understand the physical processes in the source regions.« less
Statistics of Optical Coherence Tomography Data From Human Retina
de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo
2010-01-01
Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
NASA Astrophysics Data System (ADS)
Muhammad, Ario; Goda, Katsuichiro
2018-03-01
This study investigates the impact of model complexity in source characterization and digital elevation model (DEM) resolution on the accuracy of tsunami hazard assessment and fatality estimation through a case study in Padang, Indonesia. Two types of earthquake source models, i.e. complex and uniform slip models, are adopted by considering three resolutions of DEMs, i.e. 150 m, 50 m, and 10 m. For each of the three grid resolutions, 300 complex source models are generated using new statistical prediction models of earthquake source parameters developed from extensive finite-fault models of past subduction earthquakes, whilst 100 uniform slip models are constructed with variable fault geometry without slip heterogeneity. The results highlight that significant changes to tsunami hazard and fatality estimates are observed with regard to earthquake source complexity and grid resolution. Coarse resolution (i.e. 150 m) leads to inaccurate tsunami hazard prediction and fatality estimation, whilst 50-m and 10-m resolutions produce similar results. However, velocity and momentum flux are sensitive to the grid resolution and hence, at least 10-m grid resolution needs to be implemented when considering flow-based parameters for tsunami hazard and risk assessments. In addition, the results indicate that the tsunami hazard parameters and fatality number are more sensitive to the complexity of earthquake source characterization than the grid resolution. Thus, the uniform models are not recommended for probabilistic tsunami hazard and risk assessments. Finally, the findings confirm that uncertainties of tsunami hazard level and fatality in terms of depth, velocity and momentum flux can be captured and visualized through the complex source modeling approach. From tsunami risk management perspectives, this indeed creates big data, which are useful for making effective and robust decisions.
Extracting and Converting Quantitative Data into Human Error Probabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe
2007-08-01
This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less
Improvement of short-term numerical wind predictions
NASA Astrophysics Data System (ADS)
Bedard, Joel
Geophysic Model Output Statistics (GMOS) are developed to optimize the use of NWP for complex sites. GMOS differs from other MOS that are widely used by meteorological centers in the following aspects: it takes into account the surrounding geophysical parameters such as surface roughness, terrain height, etc., along with wind direction; it can be directly applied without any training, although training will further improve the results. The GMOS was applied to improve the Environment Canada GEM-LAM 2.5km forecasts at North Cape (PEI, Canada): It improves the predictions RMSE by 25-30% for all time horizons and almost all meteorological conditions; the topographic signature of the forecast error due to insufficient grid refinement is eliminated and the NWP combined with GMOS outperform the persistence from a 2h horizon, instead of 4h without GMOS. Finally, GMOS was applied at another site (Bouctouche, NB, Canada): similar improvements were observed, thus showing its general applicability. Keywords: wind energy, wind power forecast, numerical weather prediction, complex sites, model output statistics
Mawocha, Samkeliso C; Fetters, Michael D; Legocki, Laurie J; Guetterman, Timothy C; Frederiksen, Shirley; Barsan, William G; Lewis, Roger J; Berry, Donald A; Meurer, William J
2017-06-01
Adaptive clinical trials use accumulating data from enrolled subjects to alter trial conduct in pre-specified ways based on quantitative decision rules. In this research, we sought to characterize the perspectives of key stakeholders during the development process of confirmatory-phase adaptive clinical trials within an emergency clinical trials network and to build a model to guide future development of adaptive clinical trials. We used an ethnographic, qualitative approach to evaluate key stakeholders' views about the adaptive clinical trial development process. Stakeholders participated in a series of multidisciplinary meetings during the development of five adaptive clinical trials and completed a Strengths-Weaknesses-Opportunities-Threats questionnaire. In the analysis, we elucidated overarching themes across the stakeholders' responses to develop a conceptual model. Four major overarching themes emerged during the analysis of stakeholders' responses to questioning: the perceived statistical complexity of adaptive clinical trials and the roles of collaboration, communication, and time during the development process. Frequent and open communication and collaboration were viewed by stakeholders as critical during the development process, as were the careful management of time and logistical issues related to the complexity of planning adaptive clinical trials. The Adaptive Design Development Model illustrates how statistical complexity, time, communication, and collaboration are moderating factors in the adaptive design development process. The intensity and iterative nature of this process underscores the need for funding mechanisms for the development of novel trial proposals in academic settings.
Autonomous Modeling, Statistical Complexity and Semi-annealed Treatment of Boolean Networks
NASA Astrophysics Data System (ADS)
Gong, Xinwei
This dissertation presents three studies on Boolean networks. Boolean networks are a class of mathematical systems consisting of interacting elements with binary state variables. Each element is a node with a Boolean logic gate, and the presence of interactions between any two nodes is represented by directed links. Boolean networks that implement the logic structures of real systems are studied as coarse-grained models of the real systems. Large random Boolean networks are studied with mean field approximations and used to provide a baseline of possible behaviors of large real systems. This dissertation presents one study of the former type, concerning the stable oscillation of a yeast cell-cycle oscillator, and two studies of the latter type, respectively concerning the statistical complexity of large random Boolean networks and an extension of traditional mean field techniques that accounts for the presence of short loops. In the cell-cycle oscillator study, a novel autonomous update scheme is introduced to study the stability of oscillations in small networks. A motif that corrects pulse-growing perturbations and a motif that grows pulses are identified. A combination of the two motifs is capable of sustaining stable oscillations. Examining a Boolean model of the yeast cell-cycle oscillator using an autonomous update scheme yields evidence that it is endowed with such a combination. Random Boolean networks are classified as ordered, critical or disordered based on their response to small perturbations. In the second study, random Boolean networks are taken as prototypical cases for the evaluation of two measures of complexity based on a criterion for optimal statistical prediction. One measure, defined for homogeneous systems, does not distinguish between the static spatial inhomogeneity in the ordered phase and the dynamical inhomogeneity in the disordered phase. A modification in which complexities of individual nodes are calculated yields vanishing complexity values for networks in the ordered and critical phases and for highly disordered networks, peaking somewhere in the disordered phase. Individual nodes with high complexity have, on average, a larger influence on the system dynamics. Lastly, a semi-annealed approximation that preserves the correlation between states at neighboring nodes is introduced to study a social game-inspired network model in which all links are bidirectional and all nodes have a self-input. The technique developed here is shown to yield accurate predictions of distribution of players' states, and accounts for some nontrivial collective behavior of game theoretic interest.
Modeling epidemics on adaptively evolving networks: A data-mining perspective.
Kattis, Assimakis A; Holiday, Alexander; Stoica, Ana-Andreea; Kevrekidis, Ioannis G
2016-01-01
The exploration of epidemic dynamics on dynamically evolving ("adaptive") networks poses nontrivial challenges to the modeler, such as the determination of a small number of informative statistics of the detailed network state (that is, a few "good observables") that usefully summarize the overall (macroscopic, systems-level) behavior. Obtaining reduced, small size accurate models in terms of these few statistical observables--that is, trying to coarse-grain the full network epidemic model to a small but useful macroscopic one--is even more daunting. Here we describe a data-based approach to solving the first challenge: the detection of a few informative collective observables of the detailed epidemic dynamics. This is accomplished through Diffusion Maps (DMAPS), a recently developed data-mining technique. We illustrate the approach through simulations of a simple mathematical model of epidemics on a network: a model known to exhibit complex temporal dynamics. We discuss potential extensions of the approach, as well as possible shortcomings.
NASA Astrophysics Data System (ADS)
Vallianatos, Filippos; Kouli, Maria
2013-08-01
The Digital Elevation Model (DEM) for the Crete Island with a resolution of approximately 20 meters was used in order to delineate watersheds by computing the flow direction and using it in the Watershed function. The Watershed function uses a raster of flow direction to determine contributing area. The Geographic Information Systems routine procedure was applied and the watersheds as well as the streams network (using a threshold of 2000 cells, i.e. the minimum number of cells that constitute a stream) were extracted from the hydrologically corrected (free of sinks) DEM. A number of a few thousand watersheds were delineated, and their areal extent was calculated. From these watersheds a number of 300 was finally selected for further analysis as the watersheds of extremely small area were excluded in order to avoid possible artifacts. Our analysis approach is based on the basic principles of Complexity theory and Tsallis Entropy introduces in the frame of non-extensive statistical physics. This concept has been successfully used for the analysis of a variety of complex dynamic systems including natural hazards, where fractality and long-range interactions are important. The analysis indicates that the statistical distribution of watersheds can be successfully described with the theoretical estimations of non-extensive statistical physics implying the complexity that characterizes the occurrences of them.
A statistical shape model of the human second cervical vertebra.
Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon
2015-07-01
Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.
Updating the debate on model complexity
Simmons, Craig T.; Hunt, Randall J.
2012-01-01
As scientists who are trying to understand a complex natural world that cannot be fully characterized in the field, how can we best inform the society in which we live? This founding context was addressed in a special session, “Complexity in Modeling: How Much is Too Much?” convened at the 2011 Geological Society of America Annual Meeting. The session had a variety of thought-provoking presentations—ranging from philosophy to cost-benefit analyses—and provided some areas of broad agreement that were not evident in discussions of the topic in 1998 (Hunt and Zheng, 1999). The session began with a short introduction during which model complexity was framed borrowing from an economic concept, the Law of Diminishing Returns, and an example of enjoyment derived by eating ice cream. Initially, there is increasing satisfaction gained from eating more ice cream, to a point where the gain in satisfaction starts to decrease, ending at a point when the eater sees no value in eating more ice cream. A traditional view of model complexity is similar—understanding gained from modeling can actually decrease if models become unnecessarily complex. However, oversimplified models—those that omit important aspects of the problem needed to make a good prediction—can also limit and confound our understanding. Thus, the goal of all modeling is to find the “sweet spot” of model sophistication—regardless of whether complexity was added sequentially to an overly simple model or collapsed from an initial highly parameterized framework that uses mathematics and statistics to attain an optimum (e.g., Hunt et al., 2007). Thus, holistic parsimony is attained, incorporating “as simple as possible,” as well as the equally important corollary “but no simpler.”
NASA Astrophysics Data System (ADS)
Menzel, Andreas M.
2015-11-01
Diffusion of colloidal particles in a complex environment such as polymer networks or biological cells is a topic of high complexity with significant biological and medical relevance. In such situations, the interaction between the surroundings and the particle motion has to be taken into account. We analyze a simplified diffusion model that includes some aspects of a complex environment in the framework of a nonlinear friction process: at low particle speeds, friction grows linearly with the particle velocity as for regular viscous friction; it grows more than linearly at higher particle speeds; finally, at a maximum of the possible particle speed, the friction diverges. In addition to bare diffusion, we study the influence of a constant drift force acting on the diffusing particle. While the corresponding stationary velocity distributions can be derived analytically, the displacement statistics generally must be determined numerically. However, as a benefit of our model, analytical progress can be made in one case of a special maximum particle speed. The effect of a drift force in this case is analytically determined by perturbation theory. It will be interesting in the future to compare our results to real experimental systems. One realization could be magnetic colloidal particles diffusing through a shear-thickening environment such as starch suspensions, possibly exposed to an external magnetic field gradient.
NASA Astrophysics Data System (ADS)
Wang, W. L.; Tsui, K. L.; Lo, S. M.; Liu, S. B.
2018-01-01
Crowded transportation hubs such as metro stations are thought as ideal places for the development and spread of epidemics. However, for the special features of complex spatial layout, confined environment with a large number of highly mobile individuals, it is difficult to quantify human contacts in such environments, wherein disease spreading dynamics were less explored in the previous studies. Due to the heterogeneity and dynamic nature of human interactions, increasing studies proved the importance of contact distance and length of contact in transmission probabilities. In this study, we show how detailed information on contact and exposure patterns can be obtained by statistical analyses on microscopic crowd simulation data. To be specific, a pedestrian simulation model-CityFlow was employed to reproduce individuals' movements in a metro station based on site survey data, values and distributions of individual contact rate and exposure in different simulation cases were obtained and analyzed. It is interesting that Weibull distribution fitted the histogram values of individual-based exposure in each case very well. Moreover, we found both individual contact rate and exposure had linear relationship with the average crowd densities of the environments. The results obtained in this paper can provide reference to epidemic study in complex and confined transportation hubs and refine the existing disease spreading models.
Multiple choice questions can be designed or revised to challenge learners' critical thinking.
Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A
2013-12-01
Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
NASA Astrophysics Data System (ADS)
Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.
2015-11-01
The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).
Constructing Noise-Invariant Representations of Sound in the Auditory Pathway
Rabinowitz, Neil C.; Willmore, Ben D. B.; King, Andrew J.; Schnupp, Jan W. H.
2013-01-01
Identifying behaviorally relevant sounds in the presence of background noise is one of the most important and poorly understood challenges faced by the auditory system. An elegant solution to this problem would be for the auditory system to represent sounds in a noise-invariant fashion. Since a major effect of background noise is to alter the statistics of the sounds reaching the ear, noise-invariant representations could be promoted by neurons adapting to stimulus statistics. Here we investigated the extent of neuronal adaptation to the mean and contrast of auditory stimulation as one ascends the auditory pathway. We measured these forms of adaptation by presenting complex synthetic and natural sounds, recording neuronal responses in the inferior colliculus and primary fields of the auditory cortex of anaesthetized ferrets, and comparing these responses with a sophisticated model of the auditory nerve. We find that the strength of both forms of adaptation increases as one ascends the auditory pathway. To investigate whether this adaptation to stimulus statistics contributes to the construction of noise-invariant sound representations, we also presented complex, natural sounds embedded in stationary noise, and used a decoding approach to assess the noise tolerance of the neuronal population code. We find that the code for complex sounds in the periphery is affected more by the addition of noise than the cortical code. We also find that noise tolerance is correlated with adaptation to stimulus statistics, so that populations that show the strongest adaptation to stimulus statistics are also the most noise-tolerant. This suggests that the increase in adaptation to sound statistics from auditory nerve to midbrain to cortex is an important stage in the construction of noise-invariant sound representations in the higher auditory brain. PMID:24265596
NASA Astrophysics Data System (ADS)
McCauley, Joseph L.
2009-09-01
Preface; 1. Econophysics: why and what; 2. Neo-classical economic theory; 3. Probability and stochastic processes; 4. Introduction to financial economics; 5. Introduction to portfolio selection theory; 6. Scaling, pair correlations, and conditional densities; 7. Statistical ensembles: deducing dynamics from time series; 8. Martingale option pricing; 9. FX market globalization: evolution of the dollar to worldwide reserve currency; 10. Macroeconomics and econometrics: regression models vs. empirically based modeling; 11. Complexity; Index.
Quantifying uncertainty in high-resolution coupled hydrodynamic-ecosystem models
NASA Astrophysics Data System (ADS)
Allen, J. I.; Somerfield, P. J.; Gilbert, F. J.
2007-01-01
Marine ecosystem models are becoming increasingly complex and sophisticated, and are being used to estimate the effects of future changes in the earth system with a view to informing important policy decisions. Despite their potential importance, far too little attention has been, and is generally, paid to model errors and the extent to which model outputs actually relate to real-world processes. With the increasing complexity of the models themselves comes an increasing complexity among model results. If we are to develop useful modelling tools for the marine environment we need to be able to understand and quantify the uncertainties inherent in the simulations. Analysing errors within highly multivariate model outputs, and relating them to even more complex and multivariate observational data, are not trivial tasks. Here we describe the application of a series of techniques, including a 2-stage self-organising map (SOM), non-parametric multivariate analysis, and error statistics, to a complex spatio-temporal model run for the period 1988-1989 in the Southern North Sea, coinciding with the North Sea Project which collected a wealth of observational data. We use model output, large spatio-temporally resolved data sets and a combination of methodologies (SOM, MDS, uncertainty metrics) to simplify the problem and to provide tractable information on model performance. The use of a SOM as a clustering tool allows us to simplify the dimensions of the problem while the use of MDS on independent data grouped according to the SOM classification allows us to validate the SOM. The combination of classification and uncertainty metrics allows us to pinpoint the variables and associated processes which require attention in each region. We recommend the use of this combination of techniques for simplifying complex comparisons of model outputs with real data, and analysis of error distributions.
Ioannidis, J P; McQueen, P G; Goedert, J J; Kaslow, R A
1998-03-01
Complex immunogenetic associations of disease involving a large number of gene products are difficult to evaluate with traditional statistical methods and may require complex modeling. The authors evaluated the performance of feed-forward backpropagation neural networks in predicting rapid progression to acquired immunodeficiency syndrome (AIDS) for patients with human immunodeficiency virus (HIV) infection on the basis of major histocompatibility complex variables. Networks were trained on data from patients from the Multicenter AIDS Cohort Study (n = 139) and then validated on patients from the DC Gay cohort (n = 102). The outcome of interest was rapid disease progression, defined as progression to AIDS in <6 years from seroconversion. Human leukocyte antigen (HLA) variables were selected as network inputs with multivariate regression and a previously described algorithm selecting markers with extreme point estimates for progression risk. Network performance was compared with that of logistic regression. Networks with 15 HLA inputs and a single hidden layer of five nodes achieved a sensitivity of 87.5% and specificity of 95.6% in the training set, vs. 77.0% and 76.9%, respectively, achieved by logistic regression. When validated on the DC Gay cohort, networks averaged a sensitivity of 59.1% and specificity of 74.3%, vs. 53.1% and 61.4%, respectively, for logistic regression. Neural networks offer further support to the notion that HIV disease progression may be dependent on complex interactions between different class I and class II alleles and transporters associated with antigen processing variants. The effect in the current models is of moderate magnitude, and more data as well as other host and pathogen variables may need to be considered to improve the performance of the models. Artificial intelligence methods may complement linear statistical methods for evaluating immunogenetic associations of disease.
The Hanford Thyroid Disease Study: an alternative view of the findings.
Hoffman, F Owen; Ruttenber, A James; Apostoaei, A Iulian; Carroll, Raymond J; Greenland, Sander
2007-02-01
The Hanford Thyroid Disease Study (HTDS) is one of the largest and most complex epidemiologic studies of the relation between environmental exposures to I and thyroid disease. The study detected no dose-response relation using a 0.05 level for statistical significance. The results for thyroid cancer appear inconsistent with those from other studies of populations with similar exposures, and either reflect inadequate statistical power, bias, or unique relations between exposure and disease risk. In this paper, we explore these possibilities, and present evidence that the HTDS statistical power was inadequate due to complex uncertainties associated with the mathematical models and assumptions used to reconstruct individual doses. We conclude that, at the very least, the confidence intervals reported by the HTDS for thyroid cancer and other thyroid diseases are too narrow because they fail to reflect key uncertainties in the measurement-error structure. We recommend that the HTDS results be interpreted as inconclusive rather than as evidence for little or no disease risk from Hanford exposures.
A STATISTICAL THERMODYNAMIC MODEL OF THE ORGANIZATIONAL ORDER OF VEGETATION. (R827676)
The complex pattern of vegetation is the macroscopic manifestation of biological diversity and the ecological order in space and time. How is this overwhelmingly diverse, yet wonderfully ordered spatial pattern formed, and how does it evolve? To answer these questions, most tr...
Unification of the complex Langevin method and the Lefschetzthimble method
NASA Astrophysics Data System (ADS)
Nishimura, Jun; Shimasaki, Shinji
2018-03-01
Recently there has been remarkable progress in solving the sign problem, which occurs in investigating statistical systems with a complex weight. The two promising methods, the complex Langevin method and the Lefschetz thimble method, share the idea of complexifying the dynamical variables, but their relationship has not been clear. Here we propose a unified formulation, in which the sign problem is taken care of by both the Langevin dynamics and the holomorphic gradient flow. We apply our formulation to a simple model in three different ways and show that one of them interpolates the two methods by changing the flow time.
A brief introduction to mixed effects modelling and multi-model inference in ecology
Donaldson, Lynda; Correa-Cano, Maria Eugenia; Goodwin, Cecily E.D.
2018-01-01
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions. PMID:29844961
A brief introduction to mixed effects modelling and multi-model inference in ecology.
Harrison, Xavier A; Donaldson, Lynda; Correa-Cano, Maria Eugenia; Evans, Julian; Fisher, David N; Goodwin, Cecily E D; Robinson, Beth S; Hodgson, David J; Inger, Richard
2018-01-01
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.
The highly intelligent virtual agents for modeling financial markets
NASA Astrophysics Data System (ADS)
Yang, G.; Chen, Y.; Huang, J. P.
2016-02-01
Researchers have borrowed many theories from statistical physics, like ensemble, Ising model, etc., to study complex adaptive systems through agent-based modeling. However, one fundamental difference between entities (such as spins) in physics and micro-units in complex adaptive systems is that the latter are usually with high intelligence, such as investors in financial markets. Although highly intelligent virtual agents are essential for agent-based modeling to play a full role in the study of complex adaptive systems, how to create such agents is still an open question. Hence, we propose three principles for designing high artificial intelligence in financial markets and then build a specific class of agents called iAgents based on these three principles. Finally, we evaluate the intelligence of iAgents through virtual index trading in two different stock markets. For comparison, we also include three other types of agents in this contest, namely, random traders, agents from the wealth game (modified on the famous minority game), and agents from an upgraded wealth game. As a result, iAgents perform the best, which gives a well support for the three principles. This work offers a general framework for the further development of agent-based modeling for various kinds of complex adaptive systems.
Statistical Downscaling of WRF-Chem Model: An Air Quality Analysis over Bogota, Colombia
NASA Astrophysics Data System (ADS)
Kumar, Anikender; Rojas, Nestor
2015-04-01
Statistical downscaling is a technique that is used to extract high-resolution information from regional scale variables produced by coarse resolution models such as Chemical Transport Models (CTMs). The fully coupled WRF-Chem (Weather Research and Forecasting with Chemistry) model is used to simulate air quality over Bogota. Bogota is a tropical Andean megacity located over a high-altitude plateau in the middle of very complex terrain. The WRF-Chem model was adopted for simulating the hourly ozone concentrations. The computational domains were chosen of 120x120x32, 121x121x32 and 121x121x32 grid points with horizontal resolutions of 27, 9 and 3 km respectively. The model was initialized with real boundary conditions using NCAR-NCEP's Final Analysis (FNL) and a 1ox1o (~111 km x 111 km) resolution. Boundary conditions were updated every 6 hours using reanalysis data. The emission rates were obtained from global inventories, namely the REanalysis of the TROpospheric (RETRO) chemical composition and the Emission Database for Global Atmospheric Research (EDGAR). Multiple linear regression and artificial neural network techniques are used to downscale the model output at each monitoring stations. The results confirm that the statistically downscaled outputs reduce simulated errors by up to 25%. This study provides a general overview of statistical downscaling of chemical transport models and can constitute a reference for future air quality modeling exercises over Bogota and other Colombian cities.
Arenas, Miguel
2015-04-01
NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
Statistical physics of the symmetric group.
Williams, Mobolaji
2017-04-01
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Statistical physics of the symmetric group
NASA Astrophysics Data System (ADS)
Williams, Mobolaji
2017-04-01
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Walzthoeni, Thomas; Joachimiak, Lukasz A; Rosenberger, George; Röst, Hannes L; Malmström, Lars; Leitner, Alexander; Frydman, Judith; Aebersold, Ruedi
2015-12-01
Chemical cross-linking in combination with mass spectrometry generates distance restraints of amino acid pairs in close proximity on the surface of native proteins and protein complexes. In this study we used quantitative mass spectrometry and chemical cross-linking to quantify differences in cross-linked peptides obtained from complexes in spatially discrete states. We describe a generic computational pipeline for quantitative cross-linking mass spectrometry consisting of modules for quantitative data extraction and statistical assessment of the obtained results. We used the method to detect conformational changes in two model systems: firefly luciferase and the bovine TRiC complex. Our method discovers and explains the structural heterogeneity of protein complexes using only sparse structural information.
Light, John M; Jason, Leonard A; Stevens, Edward B; Callahan, Sarah; Stone, Ariel
2016-03-01
The complex system conception of group social dynamics often involves not only changing individual characteristics, but also changing within-group relationships. Recent advances in stochastic dynamic network modeling allow these interdependencies to be modeled from data. This methodology is discussed within a context of other mathematical and statistical approaches that have been or could be applied to study the temporal evolution of relationships and behaviors within small- to medium-sized groups. An example model is presented, based on a pilot study of five Oxford House recovery homes, sober living environments for individuals following release from acute substance abuse treatment. This model demonstrates how dynamic network modeling can be applied to such systems, examines and discusses several options for pooling, and shows how results are interpreted in line with complex system concepts. Results suggest that this approach (a) is a credible modeling framework for studying group dynamics even with limited data, (b) improves upon the most common alternatives, and (c) is especially well-suited to complex system conceptions. Continuing improvements in stochastic models and associated software may finally lead to mainstream use of these techniques for the study of group dynamics, a shift already occurring in related fields of behavioral science.
Curvature and temperature of complex networks.
Krioukov, Dmitri; Papadopoulos, Fragkiskos; Vahdat, Amin; Boguñá, Marián
2009-09-01
We show that heterogeneous degree distributions in observed scale-free topologies of complex networks can emerge as a consequence of the exponential expansion of hidden hyperbolic space. Fermi-Dirac statistics provides a physical interpretation of hyperbolic distances as energies of links. The hidden space curvature affects the heterogeneity of the degree distribution, while clustering is a function of temperature. We embed the internet into the hyperbolic plane and find a remarkable congruency between the embedding and our hyperbolic model. Besides proving our model realistic, this embedding may be used for routing with only local information, which holds significant promise for improving the performance of internet routing.
Effects of additional data on Bayesian clustering.
Yamazaki, Keisuke
2017-10-01
Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical Modeling of Single Target Cell Encapsulation
Moon, SangJun; Ceyhan, Elvan; Gurkan, Umut Atakan; Demirci, Utkan
2011-01-01
High throughput drop-on-demand systems for separation and encapsulation of individual target cells from heterogeneous mixtures of multiple cell types is an emerging method in biotechnology that has broad applications in tissue engineering and regenerative medicine, genomics, and cryobiology. However, cell encapsulation in droplets is a random process that is hard to control. Statistical models can provide an understanding of the underlying processes and estimation of the relevant parameters, and enable reliable and repeatable control over the encapsulation of cells in droplets during the isolation process with high confidence level. We have modeled and experimentally verified a microdroplet-based cell encapsulation process for various combinations of cell loading and target cell concentrations. Here, we explain theoretically and validate experimentally a model to isolate and pattern single target cells from heterogeneous mixtures without using complex peripheral systems. PMID:21814548
A canonical neural mechanism for behavioral variability
Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David
2017-01-01
The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5–6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these ‘universal' statistics. PMID:28530225
Complex architecture of primes and natural numbers.
García-Pérez, Guillermo; Serrano, M Ángeles; Boguñá, Marián
2014-08-01
Natural numbers can be divided in two nonoverlapping infinite sets, primes and composites, with composites factorizing into primes. Despite their apparent simplicity, the elucidation of the architecture of natural numbers with primes as building blocks remains elusive. Here, we propose a new approach to decoding the architecture of natural numbers based on complex networks and stochastic processes theory. We introduce a parameter-free non-Markovian dynamical model that naturally generates random primes and their relation with composite numbers with remarkable accuracy. Our model satisfies the prime number theorem as an emerging property and a refined version of Cramér's conjecture about the statistics of gaps between consecutive primes that seems closer to reality than the original Cramér's version. Regarding composites, the model helps us to derive the prime factors counting function, giving the probability of distinct prime factors for any integer. Probabilistic models like ours can help to get deeper insights about primes and the complex architecture of natural numbers.
Multiagent model and mean field theory of complex auction dynamics
NASA Astrophysics Data System (ADS)
Chen, Qinghua; Huang, Zi-Gang; Wang, Yougui; Lai, Ying-Cheng
2015-09-01
Recent years have witnessed a growing interest in analyzing a variety of socio-economic phenomena using methods from statistical and nonlinear physics. We study a class of complex systems arising from economics, the lowest unique bid auction (LUBA) systems, which is a recently emerged class of online auction game systems. Through analyzing large, empirical data sets of LUBA, we identify a general feature of the bid price distribution: an inverted J-shaped function with exponential decay in the large bid price region. To account for the distribution, we propose a multi-agent model in which each agent bids stochastically in the field of winner’s attractiveness, and develop a theoretical framework to obtain analytic solutions of the model based on mean field analysis. The theory produces bid-price distributions that are in excellent agreement with those from the real data. Our model and theory capture the essential features of human behaviors in the competitive environment as exemplified by LUBA, and may provide significant quantitative insights into complex socio-economic phenomena.
Mifsud, Borbala; Martincorena, Inigo; Darbo, Elodie; Sugar, Robert; Schoenfelder, Stefan; Fraser, Peter; Luscombe, Nicholas M
2017-01-01
Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).
Models and observations of Arctic melt ponds
NASA Astrophysics Data System (ADS)
Golden, K. M.
2016-12-01
During the Arctic melt season, the sea ice surface undergoes a striking transformation from vast expanses of snow covered ice to complex mosaics of ice and melt ponds. Sea ice albedo, a key parameter in climate modeling, is largely determined by the complex evolution of melt pond configurations. In fact, ice-albedo feedback has played a significant role in the recent declines of the summer Arctic sea ice pack. However, understanding melt pond evolution remains a challenge to improving climate projections. It has been found that as the ponds grow and coalesce, the fractal dimension of their boundaries undergoes a transition from 1 to about 2, around a critical length scale of 100 square meters in area. As the ponds evolve they take complex, self-similar shapes with boundaries resembling space-filling curves. I will outline how mathematical models of composite materials and statistical physics, such as percolation and Ising models, are being used to describe this evolution and predict key geometrical parameters that agree very closely with observations.
NASA Astrophysics Data System (ADS)
Anagnostopoulos, Konstantinos N.; Azuma, Takehiro; Ito, Yuta; Nishimura, Jun; Papadoudis, Stratos Kovalkov
2018-02-01
In recent years the complex Langevin method (CLM) has proven a powerful method in studying statistical systems which suffer from the sign problem. Here we show that it can also be applied to an important problem concerning why we live in four-dimensional spacetime. Our target system is the type IIB matrix model, which is conjectured to be a nonperturbative definition of type IIB superstring theory in ten dimensions. The fermion determinant of the model becomes complex upon Euclideanization, which causes a severe sign problem in its Monte Carlo studies. It is speculated that the phase of the fermion determinant actually induces the spontaneous breaking of the SO(10) rotational symmetry, which has direct consequences on the aforementioned question. In this paper, we apply the CLM to the 6D version of the type IIB matrix model and show clear evidence that the SO(6) symmetry is broken down to SO(3). Our results are consistent with those obtained previously by the Gaussian expansion method.
Ranking streamflow model performance based on Information theory metrics
NASA Astrophysics Data System (ADS)
Martinez, Gonzalo; Pachepsky, Yakov; Pan, Feng; Wagener, Thorsten; Nicholson, Thomas
2016-04-01
The accuracy-based model performance metrics not necessarily reflect the qualitative correspondence between simulated and measured streamflow time series. The objective of this work was to use the information theory-based metrics to see whether they can be used as complementary tool for hydrologic model evaluation and selection. We simulated 10-year streamflow time series in five watersheds located in Texas, North Carolina, Mississippi, and West Virginia. Eight model of different complexity were applied. The information-theory based metrics were obtained after representing the time series as strings of symbols where different symbols corresponded to different quantiles of the probability distribution of streamflow. The symbol alphabet was used. Three metrics were computed for those strings - mean information gain that measures the randomness of the signal, effective measure complexity that characterizes predictability and fluctuation complexity that characterizes the presence of a pattern in the signal. The observed streamflow time series has smaller information content and larger complexity metrics than the precipitation time series. Watersheds served as information filters and and streamflow time series were less random and more complex than the ones of precipitation. This is reflected the fact that the watershed acts as the information filter in the hydrologic conversion process from precipitation to streamflow. The Nash Sutcliffe efficiency metric increased as the complexity of models increased, but in many cases several model had this efficiency values not statistically significant from each other. In such cases, ranking models by the closeness of the information-theory based parameters in simulated and measured streamflow time series can provide an additional criterion for the evaluation of hydrologic model performance.
Fogarty, Laurel; Wakano, Joe Yuichiro; Feldman, Marcus W; Aoki, Kenichi
2017-03-01
The forces driving cultural accumulation in human populations, both modern and ancient, are hotly debated. Did genetic, demographic, or cognitive features of behaviorally modern humans (as opposed to, say, early modern humans or Neanderthals) allow culture to accumulate to its current, unprecedented levels of complexity? Theoretical explanations for patterns of accumulation often invoke demographic factors such as population size or density, whereas statistical analyses of variation in cultural complexity often point to the importance of environmental factors such as food stability, in determining cultural complexity. Here we use both an analytical model and an agent-based simulation model to show that a full understanding of the emergence of behavioral modernity, and the cultural evolution that has followed, depends on understanding and untangling the complex relationships among culture, genetically determined cognitive ability, and demographic history. For example, we show that a small but growing population could have a different number of cultural traits from a shrinking population with the same absolute number of individuals in some circumstances.
Predicting perceptual quality of images in realistic scenario using deep filter banks
NASA Astrophysics Data System (ADS)
Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang
2018-03-01
Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.
Langevin modelling of high-frequency Hang-Seng index data
NASA Astrophysics Data System (ADS)
Tang, Lei-Han
2003-06-01
Accurate statistical characterization of financial time series, such as compound stock indices, foreign currency exchange rates, etc., is fundamental to investment risk management, pricing of derivative products and financial decision making. Traditionally, such data were analyzed and modeled from a purely statistics point of view, with little concern on the specifics of financial markets. Increasingly, however, attention has been paid to the underlying economic forces and the collective behavior of investors. Here we summarize a novel approach to the statistical modeling of a major stock index (the Hang Seng index). Based on mathematical results previously derived in the fluid turbulence literature, we show that a Langevin equation with a variable noise amplitude correctly reproduces the ubiquitous fat tails in the probability distribution of intra-day price moves. The form of the Langevin equation suggests that, despite the extremely complex nature of financial concerns and investment strategies at the individual's level, there exist simple universal rules governing the high-frequency price move in a stock market.
O'Neill, Andrea; Erikson, Li; Barnard, Patrick
2017-01-01
While global climate models (GCMs) provide useful projections of near-surface wind vectors into the 21st century, resolution is not sufficient enough for use in regional wave modeling. Statistically downscaled GCM projections from Multivariate Adaptive Constructed Analogues provide daily averaged near-surface winds at an appropriate spatial resolution for wave modeling within the orographically complex region of San Francisco Bay, but greater resolution in time is needed to capture the peak of storm events. Short-duration high wind speeds, on the order of hours, are usually excluded in statistically downscaled climate models and are of key importance in wave and subsequent coastal flood modeling. Here we present a temporal downscaling approach, similar to constructed analogues, for near-surface winds suitable for use in local wave models and evaluate changes in wind and wave conditions for the 21st century. Reconstructed hindcast winds (1975–2004) recreate important extreme wind values within San Francisco Bay. A computationally efficient method for simulating wave heights over long time periods was used to screen for extreme events. Wave hindcasts show resultant maximum wave heights of 2.2 m possible within the Bay. Changes in extreme over-water wind speeds suggest contrasting trends within the different regions of San Francisco Bay, but 21th century projections show little change in the overall magnitude of extreme winds and locally generated waves.
The epistemological status of general circulation models
NASA Astrophysics Data System (ADS)
Loehle, Craig
2018-03-01
Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.
NASA Astrophysics Data System (ADS)
Terando, A. J.; Grade, S.; Bowden, J.; Henareh Khalyani, A.; Wootten, A.; Misra, V.; Collazo, J.; Gould, W. A.; Boyles, R.
2016-12-01
Sub-tropical island nations may be particularly vulnerable to anthropogenic climate change because of predicted changes in the hydrologic cycle that would lead to significant drying in the future. However, decision makers in these regions have seen their adaptation planning efforts frustrated by the lack of island-resolving climate model information. Recently, two investigations have used statistical and dynamical downscaling techniques to develop climate change projections for the U.S. Caribbean region (Puerto Rico and U.S. Virgin Islands). We compare the results from these two studies with respect to three commonly downscaled CMIP5 global climate models (GCMs). The GCMs were dynamically downscaled at a convective-permitting scale using two different regional climate models. The statistical downscaling approach was conducted at locations with long-term climate observations and then further post-processed using climatologically aided interpolation (yielding two sets of projections). Overall, both approaches face unique challenges. The statistical approach suffers from a lack of observations necessary to constrain the model, particularly at the land-ocean boundary and in complex terrain. The dynamically downscaled model output has a systematic dry bias over the island despite ample availability of moisture in the atmospheric column. Notwithstanding these differences, both approaches are consistent in projecting a drier climate that is driven by the strong global-scale anthropogenic forcing.
Mapping and discrimination of networks in the complexity-entropy plane
NASA Astrophysics Data System (ADS)
Wiedermann, Marc; Donges, Jonathan F.; Kurths, Jürgen; Donner, Reik V.
2017-10-01
Complex networks are usually characterized in terms of their topological, spatial, or information-theoretic properties and combinations of the associated metrics are used to discriminate networks into different classes or categories. However, even with the present variety of characteristics at hand it still remains a subject of current research to appropriately quantify a network's complexity and correspondingly discriminate between different types of complex networks, like infrastructure or social networks, on such a basis. Here we explore the possibility to classify complex networks by means of a statistical complexity measure that has formerly been successfully applied to distinguish different types of chaotic and stochastic time series. It is composed of a network's averaged per-node entropic measure characterizing the network's information content and the associated Jenson-Shannon divergence as a measure of disequilibrium. We study 29 real-world networks and show that networks of the same category tend to cluster in distinct areas of the resulting complexity-entropy plane. We demonstrate that within our framework, connectome networks exhibit among the highest complexity while, e.g., transportation and infrastructure networks display significantly lower values. Furthermore, we demonstrate the utility of our framework by applying it to families of random scale-free and Watts-Strogatz model networks. We then show in a second application that the proposed framework is useful to objectively construct threshold-based networks, such as functional climate networks or recurrence networks, by choosing the threshold such that the statistical network complexity is maximized.
Cano-Sancho, German; Labrune, Léa; Ploteau, Stéphane; Marchand, Philippe; Le Bizec, Bruno; Antignac, Jean-Philippe
2018-06-01
The gold-standard matrix for measuring the internal levels of persistent organic pollutants (POPs) is the adipose tissue, however in epidemiological studies the use of serum is preferred due to the low cost and higher accessibility. The interpretation of serum biomarkers is tightly related to the understanding of the underlying causal structure relating the POPs, serum lipids and the disease. Considering the extended benefits of using serum biomarkers we aimed to further examine if through statistical modelling we would be able to improve the use and interpretation of serum biomarkers in the study of endometriosis. Hence, we have conducted a systematic comparison of statistical approaches commonly used to lipid-adjust the circulating biomarkers of POPs based on existing methods, using data from a pilot case-control study focused on severe deep infiltrating endometriosis. The odds ratios (ORs) obtained from unconditional regression for those models with serum biomarkers were further compared to those obtained from adipose tissue. The results of this exploratory study did not support the use of blood biomarkers as proxy estimates of POPs in adipose tissue to implement in risk models for endometriosis with the available statistical approaches to correct for lipids. The current statistical approaches commonly used to lipid-adjust circulating POPs, do not fully represent the underlying biological complexity between POPs, lipids and disease (especially those directly or indirectly affecting or affected by lipid metabolism). Hence, further investigations are warranted to improve the use and interpretation of blood biomarkers under complex scenarios of lipid dynamics. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
NASA Astrophysics Data System (ADS)
Mayes, R.; Lyford, M. E.; Myers, J. D.
2009-12-01
The Quantitative Reasoning in STEM (QR STEM) project is a state level Mathematics and Science Partnership Project (MSP) with a focus on the mathematics and statistics that underlies the understanding of complex global scientific issues. This session is a companion session to the QR STEM: The Science presentation. The focus of this session is the quantitative reasoning aspects of the project. As students move from understandings that range from local to global in perspective on issues of energy and environment, there is a significant increase in the need for mathematical and statistical conceptual understanding. These understandings must be accessible to the students within the scientific context, requiring the special understandings that are endemic within quantitative reasoning. The QR STEM project brings together interdisciplinary teams of higher education faculty and middle/high school teachers to explore complex problems in energy and environment. The disciplines include life sciences, physics, chemistry, earth science, statistics, and mathematics. These interdisciplinary teams develop open ended performance tasks to implement in the classroom, based on scientific concepts that underpin energy and environment. Quantitative reasoning is broken down into three components: Quantitative Literacy, Quantitative Interpretation, and Quantitative Modeling. Quantitative Literacy is composed of arithmetic concepts such as proportional reasoning, numeracy, and descriptive statistics. Quantitative Interpretation includes algebraic and geometric concepts that underlie the ability to interpret a model of natural phenomena which is provided for the student. This model may be a table, graph, or equation from which the student is to make predictions or identify trends, or from which they would use statistics to explore correlations or patterns in data. Quantitative modeling is the ability to develop the model from data, including the ability to test hypothesis using statistical procedures. We use the term model very broadly, so it includes visual models such as box models, as well as best fit equation models and hypothesis testing. One of the powerful outcomes of the project is the conversation which takes place between science teachers and mathematics teachers. First they realize that though they are teaching concepts that cross their disciplines, the barrier of scientific language within their subjects restricts students from applying the concepts across subjects. Second the mathematics teachers discover the context of science as a means of providing real world situations that engage students in the utility of mathematics as a tool for solving problems. Third the science teachers discover the barrier to understanding science that is presented by poor quantitative reasoning ability. Finally the students are engaged in exploring energy and environment in a manner which exposes the importance of seeing a problem from multiple interdisciplinary perspectives. The outcome is a democratic citizen capable of making informed decisions, and perhaps a future scientist.
Finding equilibrium in the spatiotemporal chaos of the complex Ginzburg-Landau equation
NASA Astrophysics Data System (ADS)
Ballard, Christopher C.; Esty, C. Clark; Egolf, David A.
2016-11-01
Equilibrium statistical mechanics allows the prediction of collective behaviors of large numbers of interacting objects from just a few system-wide properties; however, a similar theory does not exist for far-from-equilibrium systems exhibiting complex spatial and temporal behavior. We propose a method for predicting behaviors in a broad class of such systems and apply these ideas to an archetypal example, the spatiotemporal chaotic 1D complex Ginzburg-Landau equation in the defect chaos regime. Building on the ideas of Ruelle and of Cross and Hohenberg that a spatiotemporal chaotic system can be considered a collection of weakly interacting dynamical units of a characteristic size, the chaotic length scale, we identify underlying, mesoscale, chaotic units and effective interaction potentials between them. We find that the resulting equilibrium Takahashi model accurately predicts distributions of particle numbers. These results suggest the intriguing possibility that a class of far-from-equilibrium systems may be well described at coarse-grained scales by the well-established theory of equilibrium statistical mechanics.
Finding equilibrium in the spatiotemporal chaos of the complex Ginzburg-Landau equation.
Ballard, Christopher C; Esty, C Clark; Egolf, David A
2016-11-01
Equilibrium statistical mechanics allows the prediction of collective behaviors of large numbers of interacting objects from just a few system-wide properties; however, a similar theory does not exist for far-from-equilibrium systems exhibiting complex spatial and temporal behavior. We propose a method for predicting behaviors in a broad class of such systems and apply these ideas to an archetypal example, the spatiotemporal chaotic 1D complex Ginzburg-Landau equation in the defect chaos regime. Building on the ideas of Ruelle and of Cross and Hohenberg that a spatiotemporal chaotic system can be considered a collection of weakly interacting dynamical units of a characteristic size, the chaotic length scale, we identify underlying, mesoscale, chaotic units and effective interaction potentials between them. We find that the resulting equilibrium Takahashi model accurately predicts distributions of particle numbers. These results suggest the intriguing possibility that a class of far-from-equilibrium systems may be well described at coarse-grained scales by the well-established theory of equilibrium statistical mechanics.
Galatzer-Levy, Isaac R.; Ruggles, Kelly; Chen, Zhe
2017-01-01
Diverse environmental and biological systems interact to influence individual differences in response to environmental stress. Understanding the nature of these complex relationships can enhance the development of methods to: (1) identify risk, (2) classify individuals as healthy or ill, (3) understand mechanisms of change, and (4) develop effective treatments. The Research Domain Criteria (RDoC) initiative provides a theoretical framework to understand health and illness as the product of multiple inter-related systems but does not provide a framework to characterize or statistically evaluate such complex relationships. Characterizing and statistically evaluating models that integrate multiple levels (e.g. synapses, genes, environmental factors) as they relate to outcomes that a free from prior diagnostic benchmarks represents a challenge requiring new computational tools that are capable to capture complex relationships and identify clinically relevant populations. In the current review, we will summarize machine learning methods that can achieve these goals. PMID:29527592
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
The US EPA, Environmental Sciences Division-Las Vegas is using a variety of geopspatical and statistical modeling approaches to locate and assess the complex functions of wetland ecosystems. These assessments involve measuring landscape characteristrics and change, at multiple s...
Quasi-Experimental Analysis: A Mixture of Methods and Judgment.
ERIC Educational Resources Information Center
Cordray, David S.
1986-01-01
The role of human judgment in the development and synthesis of evidence has not been adequately developed or acknowledged within quasi-experimental analysis. Corrective solutions need to confront the fact that causal analysis within complex environments will require a more active assessment that entails reasoning and statistical modeling.…
Structural Equation Modeling: Possibilities for Language Learning Researchers
ERIC Educational Resources Information Center
Hancock, Gregory R.; Schoonen, Rob
2015-01-01
Although classical statistical techniques have been a valuable tool in second language (L2) research, L2 research questions have started to grow beyond those techniques' capabilities, and indeed are often limited by them. Questions about how complex constructs relate to each other or to constituent subskills, about longitudinal development in…
USDA-ARS?s Scientific Manuscript database
Despite the enormous relevance of zoonotic infections to world-wide public health, and despite much effort in modeling individual zoonoses, a fundamental understanding of the disease dynamics and the nature of outbreaks emanating from such a complex system is still lacking. We introduce a simple sto...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jing; Ackerman, David M.; Lin, Victor S.-Y.
2013-04-02
Statistical mechanical modeling is performed of a catalytic conversion reaction within a functionalized nanoporous material to assess the effect of varying the reaction product-pore interior interaction from attractive to repulsive. A strong enhancement in reactivity is observed not just due to the shift in reaction equilibrium towards completion but also due to enhanced transport within the pore resulting from reduced loading. The latter effect is strongest for highly restricted transport (single-file diffusion), and applies even for irreversible reactions. The analysis is performed utilizing a generalized hydrodynamic formulation of the reaction-diffusion equations which can reliably capture the complex interplay between reactionmore » and restricted transport.« less
Increasing market efficiency in the stock markets
NASA Astrophysics Data System (ADS)
Yang, Jae-Suk; Kwak, Wooseop; Kaizoji, Taisei; Kim, In-Mook
2008-01-01
We study the temporal evolutions of three stock markets; Standard and Poor's 500 index, Nikkei 225 Stock Average, and the Korea Composite Stock Price Index. We observe that the probability density function of the log-return has a fat tail but the tail index has been increasing continuously in recent years. We have also found that the variance of the autocorrelation function, the scaling exponent of the standard deviation, and the statistical complexity decrease, but that the entropy density increases as time goes over time. We introduce a modified microscopic spin model and simulate the model to confirm such increasing and decreasing tendencies in statistical quantities. These findings indicate that these three stock markets are becoming more efficient.
Virtual sensor models for real-time applications
NASA Astrophysics Data System (ADS)
Hirsenkorn, Nils; Hanke, Timo; Rauch, Andreas; Dehlink, Bernhard; Rasshofer, Ralph; Biebl, Erwin
2016-09-01
Increased complexity and severity of future driver assistance systems demand extensive testing and validation. As supplement to road tests, driving simulations offer various benefits. For driver assistance functions the perception of the sensors is crucial. Therefore, sensors also have to be modeled. In this contribution, a statistical data-driven sensor-model, is described. The state-space based method is capable of modeling various types behavior. In this contribution, the modeling of the position estimation of an automotive radar system, including autocorrelations, is presented. For rendering real-time capability, an efficient implementation is presented.
Uniting statistical and individual-based approaches for animal movement modelling.
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.
Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047
POWER ANALYSIS FOR COMPLEX MEDIATIONAL DESIGNS USING MONTE CARLO METHODS
Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.
2013-01-01
Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex mediational models. The approach is based on the well known technique of generating a large number of samples in a Monte Carlo study, and estimating power as the percentage of cases in which an estimate of interest is significantly different from zero. Examples of power calculation for commonly used mediational models are provided. Power analyses for the single mediator, multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs are described. Annotated sample syntax for Mplus is appended and tabled values of required sample sizes are shown for some models. PMID:23935262
A simulator for evaluating methods for the detection of lesion-deficit associations
NASA Technical Reports Server (NTRS)
Megalooikonomou, V.; Davatzikos, C.; Herskovits, E. H.
2000-01-01
Although much has been learned about the functional organization of the human brain through lesion-deficit analysis, the variety of statistical and image-processing methods developed for this purpose precludes a closed-form analysis of the statistical power of these systems. Therefore, we developed a lesion-deficit simulator (LDS), which generates artificial subjects, each of which consists of a set of functional deficits, and a brain image with lesions; the deficits and lesions conform to predefined distributions. We used probability distributions to model the number, sizes, and spatial distribution of lesions, to model the structure-function associations, and to model registration error. We used the LDS to evaluate, as examples, the effects of the complexities and strengths of lesion-deficit associations, and of registration error, on the power of lesion-deficit analysis. We measured the numbers of recovered associations from these simulated data, as a function of the number of subjects analyzed, the strengths and number of associations in the statistical model, the number of structures associated with a particular function, and the prior probabilities of structures being abnormal. The number of subjects required to recover the simulated lesion-deficit associations was found to have an inverse relationship to the strength of associations, and to the smallest probability in the structure-function model. The number of structures associated with a particular function (i.e., the complexity of associations) had a much greater effect on the performance of the analysis method than did the total number of associations. We also found that registration error of 5 mm or less reduces the number of associations discovered by approximately 13% compared to perfect registration. The LDS provides a flexible framework for evaluating many aspects of lesion-deficit analysis.
Zhu, Lin; Dai, Zhenxue; Gong, Huili; ...
2015-06-12
Understanding the heterogeneity arising from the complex architecture of sedimentary sequences in alluvial fans is challenging. This study develops a statistical inverse framework in a multi-zone transition probability approach for characterizing the heterogeneity in alluvial fans. An analytical solution of the transition probability matrix is used to define the statistical relationships among different hydrofacies and their mean lengths, integral scales, and volumetric proportions. A statistical inversion is conducted to identify the multi-zone transition probability models and estimate the optimal statistical parameters using the modified Gauss–Newton–Levenberg–Marquardt method. The Jacobian matrix is computed by the sensitivity equation method, which results in anmore » accurate inverse solution with quantification of parameter uncertainty. We use the Chaobai River alluvial fan in the Beijing Plain, China, as an example for elucidating the methodology of alluvial fan characterization. The alluvial fan is divided into three sediment zones. In each zone, the explicit mathematical formulations of the transition probability models are constructed with optimized different integral scales and volumetric proportions. The hydrofacies distributions in the three zones are simulated sequentially by the multi-zone transition probability-based indicator simulations. Finally, the result of this study provides the heterogeneous structure of the alluvial fan for further study of flow and transport simulations.« less
Sample Skewness as a Statistical Measurement of Neuronal Tuning Sharpness
Samonds, Jason M.; Potetz, Brian R.; Lee, Tai Sing
2014-01-01
We propose using the statistical measurement of the sample skewness of the distribution of mean firing rates of a tuning curve to quantify sharpness of tuning. For some features, like binocular disparity, tuning curves are best described by relatively complex and sometimes diverse functions, making it difficult to quantify sharpness with a single function and parameter. Skewness provides a robust nonparametric measure of tuning curve sharpness that is invariant with respect to the mean and variance of the tuning curve and is straightforward to apply to a wide range of tuning, including simple orientation tuning curves and complex object tuning curves that often cannot even be described parametrically. Because skewness does not depend on a specific model or function of tuning, it is especially appealing to cases of sharpening where recurrent interactions among neurons produce sharper tuning curves that deviate in a complex manner from the feedforward function of tuning. Since tuning curves for all neurons are not typically well described by a single parametric function, this model independence additionally allows skewness to be applied to all recorded neurons, maximizing the statistical power of a set of data. We also compare skewness with other nonparametric measures of tuning curve sharpness and selectivity. Compared to these other nonparametric measures tested, skewness is best used for capturing the sharpness of multimodal tuning curves defined by narrow peaks (maximum) and broad valleys (minima). Finally, we provide a more formal definition of sharpness using a shape-based information gain measure and derive and show that skewness is correlated with this definition. PMID:24555451
Jonsen, Ian D; Myers, Ransom A; James, Michael C
2006-09-01
1. Biological and statistical complexity are features common to most ecological data that hinder our ability to extract meaningful patterns using conventional tools. Recent work on implementing modern statistical methods for analysis of such ecological data has focused primarily on population dynamics but other types of data, such as animal movement pathways obtained from satellite telemetry, can also benefit from the application of modern statistical tools. 2. We develop a robust hierarchical state-space approach for analysis of multiple satellite telemetry pathways obtained via the Argos system. State-space models are time-series methods that allow unobserved states and biological parameters to be estimated from data observed with error. We show that the approach can reveal important patterns in complex, noisy data where conventional methods cannot. 3. Using the largest Atlantic satellite telemetry data set for critically endangered leatherback turtles, we show that the diel pattern in travel rates of these turtles changes over different phases of their migratory cycle. While foraging in northern waters the turtles show similar travel rates during day and night, but on their southward migration to tropical waters travel rates are markedly faster during the day. These patterns are generally consistent with diving data, and may be related to changes in foraging behaviour. Interestingly, individuals that migrate southward to breed generally show higher daytime travel rates than individuals that migrate southward in a non-breeding year. 4. Our approach is extremely flexible and can be applied to many ecological analyses that use complex, sequential data.
Adaptation in Coding by Large Populations of Neurons in the Retina
NASA Astrophysics Data System (ADS)
Ioffe, Mark L.
A comprehensive theory of neural computation requires an understanding of the statistical properties of the neural population code. The focus of this work is the experimental study and theoretical analysis of the statistical properties of neural activity in the tiger salamander retina. This is an accessible yet complex system, for which we control the visual input and record from a substantial portion--greater than a half--of the ganglion cell population generating the spiking output. Our experiments probe adaptation of the retina to visual statistics: a central feature of sensory systems which have to adjust their limited dynamic range to a far larger space of possible inputs. In Chapter 1 we place our work in context with a brief overview of the relevant background. In Chapter 2 we describe the experimental methodology of recording from 100+ ganglion cells in the tiger salamander retina. In Chapter 3 we first present the measurements of adaptation of individual cells to changes in stimulation statistics and then investigate whether pairwise correlations in fluctuations of ganglion cell activity change across different stimulation conditions. We then transition to a study of the population-level probability distribution of the retinal response captured with maximum-entropy models. Convergence of the model inference is presented in Chapter 4. In Chapter 5 we first test the empirical presence of a phase transition in such models fitting the retinal response to different experimental conditions, and then proceed to develop other characterizations which are sensitive to complexity in the interaction matrix. This includes an analysis of the dynamics of sampling at finite temperature, which demonstrates a range of subtle attractor-like properties in the energy landscape. These are largely conserved when ambient illumination is varied 1000-fold, a result not necessarily apparent from the measured low-order statistics of the distribution. Our results form a consistent picture which is discussed at the end of Chapter 5. We conclude with a few future directions related to this thesis.
Causes and correlations in cambium phenology: towards an integrated framework of xylogenesis.
Rossi, Sergio; Morin, Hubert; Deslauriers, Annie
2012-03-01
Although habitually considered as a whole, xylogenesis is a complex process of division and maturation of a pool of cells where the relationship between the phenological phases generating such a growth pattern remains essentially unknown. This study investigated the causal relationships in cambium phenology of black spruce [Picea mariana (Mill.) BSP] monitored for 8 years on four sites of the boreal forest of Quebec, Canada. The dependency links connecting the timing of xylem cell differentiation and cell production were defined and the resulting causal model was analysed with d-sep tests and generalized mixed models with repeated measurements, and tested with Fisher's C statistics to determine whether and how causality propagates through the measured variables. The higher correlations were observed between the dates of emergence of the first developing cells and between the ending of the differentiation phases, while the number of cells was significantly correlated with all phenological phases. The model with eight dependency links was statistically valid for explaining the causes and correlations between the dynamics of cambium phenology. Causal modelling suggested that the phenological phases involved in xylogenesis are closely interconnected by complex relationships of cause and effect, with the onset of cell differentiation being the main factor directly or indirectly triggering all successive phases of xylem maturation.
Causes and correlations in cambium phenology: towards an integrated framework of xylogenesis
Rossi, Sergio; Morin, Hubert; Deslauriers, Annie
2012-01-01
Although habitually considered as a whole, xylogenesis is a complex process of division and maturation of a pool of cells where the relationship between the phenological phases generating such a growth pattern remains essentially unknown. This study investigated the causal relationships in cambium phenology of black spruce [Picea mariana (Mill.) BSP] monitored for 8 years on four sites of the boreal forest of Quebec, Canada. The dependency links connecting the timing of xylem cell differentiation and cell production were defined and the resulting causal model was analysed with d-sep tests and generalized mixed models with repeated measurements, and tested with Fisher’s C statistics to determine whether and how causality propagates through the measured variables. The higher correlations were observed between the dates of emergence of the first developing cells and between the ending of the differentiation phases, while the number of cells was significantly correlated with all phenological phases. The model with eight dependency links was statistically valid for explaining the causes and correlations between the dynamics of cambium phenology. Causal modelling suggested that the phenological phases involved in xylogenesis are closely interconnected by complex relationships of cause and effect, with the onset of cell differentiation being the main factor directly or indirectly triggering all successive phases of xylem maturation. PMID:22174441
Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W
2016-01-01
A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.
Statistical approaches for the determination of cut points in anti-drug antibody bioassays.
Schaarschmidt, Frank; Hofmann, Matthias; Jaki, Thomas; Grün, Bettina; Hothorn, Ludwig A
2015-03-01
Cut points in immunogenicity assays are used to classify future specimens into anti-drug antibody (ADA) positive or negative. To determine a cut point during pre-study validation, drug-naive specimens are often analyzed on multiple microtiter plates taking sources of future variability into account, such as runs, days, analysts, gender, drug-spiked and the biological variability of un-spiked specimens themselves. Five phenomena may complicate the statistical cut point estimation: i) drug-naive specimens may contain already ADA-positives or lead to signals that erroneously appear to be ADA-positive, ii) mean differences between plates may remain after normalization of observations by negative control means, iii) experimental designs may contain several factors in a crossed or hierarchical structure, iv) low sample sizes in such complex designs lead to low power for pre-tests on distribution, outliers and variance structure, and v) the choice between normal and log-normal distribution has a serious impact on the cut point. We discuss statistical approaches to account for these complex data: i) mixture models, which can be used to analyze sets of specimens containing an unknown, possibly larger proportion of ADA-positive specimens, ii) random effects models, followed by the estimation of prediction intervals, which provide cut points while accounting for several factors, and iii) diagnostic plots, which allow the post hoc assessment of model assumptions. All methods discussed are available in the corresponding R add-on package mixADA. Copyright © 2015 Elsevier B.V. All rights reserved.
Quantum statistics in complex networks
NASA Astrophysics Data System (ADS)
Bianconi, Ginestra
The Barabasi-Albert (BA) model for a complex network shows a characteristic power law connectivity distribution typical of scale free systems. The Ising model on the BA network shows that the ferromagnetic phase transition temperature depends logarithmically on its size. We have introduced a fitness parameter for the BA network which describes the different abilities of nodes to compete for links. This model predicts the formation of a scale free network where each node increases its connectivity in time as a power-law with an exponent depending on its fitness. This model includes the fact that the node connectivity and growth rate do not depend on the node age alone and it reproduces non trivial correlation properties of the Internet. We have proposed a model of bosonic networks by a generalization of the BA model where the properties of quantum statistics can be applied. We have introduced a fitness eta i = e-bei where the temperature T = 1/ b is determined by the noise in the system and the energy ei accounts for qualitative differences of each node for acquiring links. The results of this work show that a power law network with exponent gamma = 2 can give a Bose condensation where a single node grabs a finite fraction of all the links. In order to address the connection with self-organized processes we have introduced a model for a growing Cayley tree that generalizes the dynamics of invasion percolation. At each node we associate a parameter ei (called energy) such that the probability to grow for each node is given by pii ∝ ebei where T = 1/ b is a statistical parameter of the system determined by the noise called the temperature. This model has been solved analytically with a similar mathematical technique as the bosonic scale-free networks and it shows the self organization of the low energy nodes at the interface. In the thermodynamic limit the Fermi distribution describes the probability of the energy distribution at the interface.
Impact analysis of two kinds of failure strategies in Beijing road transportation network
NASA Astrophysics Data System (ADS)
Zhang, Zundong; Xu, Xiaoyang; Zhang, Zhaoran; Zhou, Huijuan
The Beijing road transportation network (BRTN), as a large-scale technological network, exhibits very complex and complicate features during daily periods. And it has been widely highlighted that how statistical characteristics (i.e. average path length and global network efficiency) change while the network evolves. In this paper, by using different modeling concepts, three kinds of network models of BRTN namely the abstract network model, the static network model with road mileage as weights and the dynamic network model with travel time as weights — are constructed, respectively, according to the topological data and the real detected flow data. The degree distribution of the three kinds of network models are analyzed, which proves that the urban road infrastructure network and the dynamic network behavior like scale-free networks. By analyzing and comparing the important statistical characteristics of three models under random attacks and intentional attacks, it shows that the urban road infrastructure network and the dynamic network of BRTN are both robust and vulnerable.
Statistics of Shared Components in Complex Component Systems
NASA Astrophysics Data System (ADS)
Mazzolini, Andrea; Gherardi, Marco; Caselle, Michele; Cosentino Lagomarsino, Marco; Osella, Matteo
2018-04-01
Many complex systems are modular. Such systems can be represented as "component systems," i.e., sets of elementary components, such as LEGO bricks in LEGO sets. The bricks found in a LEGO set reflect a target architecture, which can be built following a set-specific list of instructions. In other component systems, instead, the underlying functional design and constraints are not obvious a priori, and their detection is often a challenge of both scientific and practical importance, requiring a clear understanding of component statistics. Importantly, some quantitative invariants appear to be common to many component systems, most notably a common broad distribution of component abundances, which often resembles the well-known Zipf's law. Such "laws" affect in a general and nontrivial way the component statistics, potentially hindering the identification of system-specific functional constraints or generative processes. Here, we specifically focus on the statistics of shared components, i.e., the distribution of the number of components shared by different system realizations, such as the common bricks found in different LEGO sets. To account for the effects of component heterogeneity, we consider a simple null model, which builds system realizations by random draws from a universe of possible components. Under general assumptions on abundance heterogeneity, we provide analytical estimates of component occurrence, which quantify exhaustively the statistics of shared components. Surprisingly, this simple null model can positively explain important features of empirical component-occurrence distributions obtained from large-scale data on bacterial genomes, LEGO sets, and book chapters. Specific architectural features and functional constraints can be detected from occurrence patterns as deviations from these null predictions, as we show for the illustrative case of the "core" genome in bacteria.
Statistical Features of Complex Systems ---Toward Establishing Sociological Physics---
NASA Astrophysics Data System (ADS)
Kobayashi, Naoki; Kuninaka, Hiroto; Wakita, Jun-ichi; Matsushita, Mitsugu
2011-07-01
Complex systems have recently attracted much attention, both in natural sciences and in sociological sciences. Members constituting a complex system evolve through nonlinear interactions among each other. This means that in a complex system the multiplicative experience or, so to speak, the history of each member produces its present characteristics. If attention is paid to any statistical property in any complex system, the lognormal distribution is the most natural and appropriate among the standard or ``normal'' statistics to overview the whole system. In fact, the lognormality emerges rather conspicuously when we examine, as familiar and typical examples of statistical aspects in complex systems, the nursing-care period for the aged, populations of prefectures and municipalities, and our body height and weight. Many other examples are found in nature and society. On the basis of these observations, we discuss the possibility of sociological physics.
Hughes, G J; Nickerson, E; Enoch, D A; Ahluwalia, J; Wilkinson, C; Ayers, R; Brown, N M
2013-07-01
Clostridium difficile infection remains a major challenge for hospitals. Although targeted infection control initiatives have been shown to be effective in reducing the incidence of hospital-acquired C. difficile infection, there is little evidence available to assess the effectiveness of specific interventions. To use statistical modelling to detect substantial reductions in the incidence of C. difficile from time series data from two hospitals in England, and relate these time points to infection control interventions. A statistical breakpoints model was fitted to likely hospital-acquired C. difficile infection incidence data from a teaching hospital (2002-2009) and a district general hospital (2005-2009) in England. Models with increasing complexity (i.e. increasing the number of breakpoints) were tested for an improved fit to the data. Partitions estimated from breakpoint models were tested for individual stability using statistical process control charts. Major infection control interventions from both hospitals during this time were grouped according to their primary target (antibiotics, cleaning, isolation, other) and mapped to the model-suggested breakpoints. For both hospitals, breakpoints coincided with enhancements to cleaning protocols. Statistical models enabled formal assessment of the impact of different interventions, and showed that enhancements to deep cleaning programmes are the interventions that have most likely led to substantial reductions in hospital-acquired C. difficile infections at the two hospitals studied. Copyright © 2013 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Challet, Damien; Marsili, M.; Ottino, Gabriele
2004-02-01
We mathematize El Farol bar problem and transform it into a workable model. We find general conditions on the predictor space under which the convergence of the average attendance to the resource level does not require any intelligence on the side of the agents. Secondly, specializing to a particular ensemble of continuous strategies yields a model similar to the Minority Game. Statistical physics of disordered systems allows us to derive a complete understanding of the complex behavior of this model, on the basis of its phase diagram.
Properties of a memory network in psychology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wedemann, Roseli S.; Donangelo, Raul; Carvalho, Luis A. V. de
We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Properties of a memory network in psychology
NASA Astrophysics Data System (ADS)
Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.
2007-12-01
We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Observing Consistency in Online Communication Patterns for User Re-Identification.
Adeyemi, Ikuesan Richard; Razak, Shukor Abd; Salleh, Mazleena; Venter, Hein S
2016-01-01
Comprehension of the statistical and structural mechanisms governing human dynamics in online interaction plays a pivotal role in online user identification, online profile development, and recommender systems. However, building a characteristic model of human dynamics on the Internet involves a complete analysis of the variations in human activity patterns, which is a complex process. This complexity is inherent in human dynamics and has not been extensively studied to reveal the structural composition of human behavior. A typical method of anatomizing such a complex system is viewing all independent interconnectivity that constitutes the complexity. An examination of the various dimensions of human communication pattern in online interactions is presented in this paper. The study employed reliable server-side web data from 31 known users to explore characteristics of human-driven communications. Various machine-learning techniques were explored. The results revealed that each individual exhibited a relatively consistent, unique behavioral signature and that the logistic regression model and model tree can be used to accurately distinguish online users. These results are applicable to one-to-one online user identification processes, insider misuse investigation processes, and online profiling in various areas.
Sitek, Arkadiusz
2016-12-21
The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).
NASA Astrophysics Data System (ADS)
Sitek, Arkadiusz
2016-12-01
The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).
Trends in Mediation Analysis in Nursing Research: Improving Current Practice.
Hertzog, Melody
2018-06-01
The purpose of this study was to describe common approaches used by nursing researchers to test mediation models and evaluate them within the context of current methodological advances. MEDLINE was used to locate studies testing a mediation model and published from 2004 to 2015 in nursing journals. Design (experimental/correlation, cross-sectional/longitudinal, model complexity) and analysis (method, inclusion of test of mediated effect, violations/discussion of assumptions, sample size/power) characteristics were coded for 456 studies. General trends were identified using descriptive statistics. Consistent with findings of reviews in other disciplines, evidence was found that nursing researchers may not be aware of the strong assumptions and serious limitations of their analyses. Suggestions for strengthening the rigor of such studies and an overview of current methods for testing more complex models, including longitudinal mediation processes, are presented.
Using machine learning tools to model complex toxic interactions with limited sampling regimes.
Bertin, Matthew J; Moeller, Peter; Guillette, Louis J; Chapman, Robert W
2013-03-19
A major impediment to understanding the impact of environmental stress, including toxins and other pollutants, on organisms, is that organisms are rarely challenged by one or a few stressors in natural systems. Thus, linking laboratory experiments that are limited by practical considerations to a few stressors and a few levels of these stressors to real world conditions is constrained. In addition, while the existence of complex interactions among stressors can be identified by current statistical methods, these methods do not provide a means to construct mathematical models of these interactions. In this paper, we offer a two-step process by which complex interactions of stressors on biological systems can be modeled in an experimental design that is within the limits of practicality. We begin with the notion that environment conditions circumscribe an n-dimensional hyperspace within which biological processes or end points are embedded. We then randomly sample this hyperspace to establish experimental conditions that span the range of the relevant parameters and conduct the experiment(s) based upon these selected conditions. Models of the complex interactions of the parameters are then extracted using machine learning tools, specifically artificial neural networks. This approach can rapidly generate highly accurate models of biological responses to complex interactions among environmentally relevant toxins, identify critical subspaces where nonlinear responses exist, and provide an expedient means of designing traditional experiments to test the impact of complex mixtures on biological responses. Further, this can be accomplished with an astonishingly small sample size.
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Synthetic Earthquake Statistics From Physical Fault Models for the Lower Rhine Embayment
NASA Astrophysics Data System (ADS)
Brietzke, G. B.; Hainzl, S.; Zöller, G.
2012-04-01
As of today, seismic risk and hazard estimates mostly use pure empirical, stochastic models of earthquake fault systems tuned specifically to the vulnerable areas of interest. Although such models allow for reasonable risk estimates they fail to provide a link between the observed seismicity and the underlying physical processes. Solving a state-of-the-art fully dynamic description set of all relevant physical processes related to earthquake fault systems is likely not useful since it comes with a large number of degrees of freedom, poor constraints on its model parameters and a huge computational effort. Here, quasi-static and quasi-dynamic physical fault simulators provide a compromise between physical completeness and computational affordability and aim at providing a link between basic physical concepts and statistics of seismicity. Within the framework of quasi-static and quasi-dynamic earthquake simulators we investigate a model of the Lower Rhine Embayment (LRE) that is based upon seismological and geological data. We present and discuss statistics of the spatio-temporal behavior of generated synthetic earthquake catalogs with respect to simplification (e.g. simple two-fault cases) as well as to complication (e.g. hidden faults, geometric complexity, heterogeneities of constitutive parameters).
NASA Astrophysics Data System (ADS)
Aldrin, John C.; Annis, Charles; Sabbagh, Harold A.; Lindgren, Eric A.
2016-02-01
A comprehensive approach to NDE and SHM characterization error (CE) evaluation is presented that follows the framework of the `ahat-versus-a' regression analysis for POD assessment. Characterization capability evaluation is typically more complex with respect to current POD evaluations and thus requires engineering and statistical expertise in the model-building process to ensure all key effects and interactions are addressed. Justifying the statistical model choice with underlying assumptions is key. Several sizing case studies are presented with detailed evaluations of the most appropriate statistical model for each data set. The use of a model-assisted approach is introduced to help assess the reliability of NDE and SHM characterization capability under a wide range of part, environmental and damage conditions. Best practices of using models are presented for both an eddy current NDE sizing and vibration-based SHM case studies. The results of these studies highlight the general protocol feasibility, emphasize the importance of evaluating key application characteristics prior to the study, and demonstrate an approach to quantify the role of varying SHM sensor durability and environmental conditions on characterization performance.
NASA Astrophysics Data System (ADS)
Puscas, Liliana A.; Galatus, Ramona V.; Puscas, Niculae N.
In this article, we report a theoretical study concerning some statistical parameters which characterize the single- and double-pass Er3+-doped Ti:LiNbO3 M-mode straight waveguides. For the derivation and the evaluation of the Fano factor, the statistical fluctuation and the spontaneous emission factor we used a quasi two-level model in the small gain approximation and the unsaturated regime. The simulation results show the evolution of these parameters under various pump regimes and waveguide lengths. The obtained results can be used for the design of complex rare earth-doped integrated circuits.
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Protein Models Docking Benchmark 2
Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.
2015-01-01
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
Geoscience in the Big Data Era: Are models obsolete?
NASA Astrophysics Data System (ADS)
Yuen, D. A.; Zheng, L.; Stark, P. B.; Morra, G.; Knepley, M.; Wang, X.
2016-12-01
In last few decades, the velocity, volume, and variety of geophysical data have increased, while the development of the Internet and distributed computing has led to the emergence of "data science." Fitting and running numerical models, especially based on PDEs, is the main consumer of flops in geoscience. Can large amounts of diverse data supplant modeling? Without the ability to conduct randomized, controlled experiments, causal inference requires understanding the physics. It is sometimes possible to predict well without understanding the system—if (1) the system is predictable, (2) data on "important" variables are available, and (3) the system changes slowly enough. And sometimes even a crude model can help the data "speak for themselves" much more clearly. For example, Shearer (1991) used a 1-dimensional velocity model to stack long-period seismograms, revealing upper mantle discontinuities. This was a "big data" approach: the main use of computing was in the data processing, rather than in modeling, yet the "signal" became clear. In contrast, modelers tend to use all available computing power to fit even more complex models, resulting in a cycle where uncertainty quantification (UQ) is never possible: even if realistic UQ required only 1,000 model evaluations, it is never in reach. To make more reliable inferences requires better data analysis and statistics, not more complex models. Geoscientists need to learn new skills and tools: sound software engineering practices; open programming languages suitable for big data; parallel and distributed computing; data visualization; and basic nonparametric, computationally based statistical inference, such as permutation tests. They should work reproducibly, scripting all analyses and avoiding point-and-click tools.
Quantum vertex model for reversible classical computing.
Chamon, C; Mucciolo, E R; Ruckenstein, A E; Yang, Z-C
2017-05-12
Mappings of classical computation onto statistical mechanics models have led to remarkable successes in addressing some complex computational problems. However, such mappings display thermodynamic phase transitions that may prevent reaching solution even for easy problems known to be solvable in polynomial time. Here we map universal reversible classical computations onto a planar vertex model that exhibits no bulk classical thermodynamic phase transition, independent of the computational circuit. Within our approach the solution of the computation is encoded in the ground state of the vertex model and its complexity is reflected in the dynamics of the relaxation of the system to its ground state. We use thermal annealing with and without 'learning' to explore typical computational problems. We also construct a mapping of the vertex model into the Chimera architecture of the D-Wave machine, initiating an approach to reversible classical computation based on state-of-the-art implementations of quantum annealing.
Quantum vertex model for reversible classical computing
NASA Astrophysics Data System (ADS)
Chamon, C.; Mucciolo, E. R.; Ruckenstein, A. E.; Yang, Z.-C.
2017-05-01
Mappings of classical computation onto statistical mechanics models have led to remarkable successes in addressing some complex computational problems. However, such mappings display thermodynamic phase transitions that may prevent reaching solution even for easy problems known to be solvable in polynomial time. Here we map universal reversible classical computations onto a planar vertex model that exhibits no bulk classical thermodynamic phase transition, independent of the computational circuit. Within our approach the solution of the computation is encoded in the ground state of the vertex model and its complexity is reflected in the dynamics of the relaxation of the system to its ground state. We use thermal annealing with and without `learning' to explore typical computational problems. We also construct a mapping of the vertex model into the Chimera architecture of the D-Wave machine, initiating an approach to reversible classical computation based on state-of-the-art implementations of quantum annealing.
Supersonic projectile models for asynchronous shooter localization
NASA Astrophysics Data System (ADS)
Kozick, Richard J.; Whipps, Gene T.; Ash, Joshua N.
2011-06-01
In this work we consider the localization of a gunshot using a distributed sensor network measuring time differences of arrival between a firearm's muzzle blast and the shockwave induced by a supersonic bullet. This so-called MB-SW approach is desirable because time synchronization is not required between the sensors, however it suffers from increased computational complexity and requires knowledge of the bullet's velocity at all points along its trajectory. While the actual velocity profile of a particular gunshot is unknown, one may use a parameterized model for the velocity profile and simultaneously fit the model and localize the shooter. In this paper we study efficient solutions for the localization problem and identify deceleration models that trade off localization accuracy and computational complexity. We also develop a statistical analysis that includes bias due to mismatch between the true and actual deceleration models and covariance due to additive noise.
Simulating statistics of lightning-induced and man made fires
NASA Astrophysics Data System (ADS)
Krenn, R.; Hergarten, S.
2009-04-01
The frequency-area distributions of forest fires show power-law behavior with scaling exponents α in a quite narrow range, relating wildfire research to the theoretical framework of self-organized criticality. Examples of self-organized critical behavior can be found in computer simulations of simple cellular automata. The established self-organized critical Drossel-Schwabl forest fire model (DS-FFM) is one of the most widespread models in this context. Despite its qualitative agreement with event-size statistics from nature, its applicability is still questioned. Apart from general concerns that the DS-FFM apparently oversimplifies the complex nature of forest dynamics, it significantly overestimates the frequency of large fires. We present a straightforward modification of the model rules that increases the scaling exponent α by approximately 13 and brings the simulated event-size statistics close to those observed in nature. In addition, combined simulations of both the original and the modified model predict a dependence of the overall distribution on the ratio of lightning induced and man made fires as well as a difference between their respective event-size statistics. The increase of the scaling exponent with decreasing lightning probability as well as the splitting of the partial distributions are confirmed by the analysis of the Canadian Large Fire Database. As a consequence, lightning induced and man made forest fires cannot be treated separately in wildfire modeling, hazard assessment and forest management.
Cortical Surround Interactions and Perceptual Salience via Natural Scene Statistics
Coen-Cagli, Ruben; Dayan, Peter; Schwartz, Odelia
2012-01-01
Spatial context in images induces perceptual phenomena associated with salience and modulates the responses of neurons in primary visual cortex (V1). However, the computational and ecological principles underlying contextual effects are incompletely understood. We introduce a model of natural images that includes grouping and segmentation of neighboring features based on their joint statistics, and we interpret the firing rates of V1 neurons as performing optimal recognition in this model. We show that this leads to a substantial generalization of divisive normalization, a computation that is ubiquitous in many neural areas and systems. A main novelty in our model is that the influence of the context on a target stimulus is determined by their degree of statistical dependence. We optimized the parameters of the model on natural image patches, and then simulated neural and perceptual responses on stimuli used in classical experiments. The model reproduces some rich and complex response patterns observed in V1, such as the contrast dependence, orientation tuning and spatial asymmetry of surround suppression, while also allowing for surround facilitation under conditions of weak stimulation. It also mimics the perceptual salience produced by simple displays, and leads to readily testable predictions. Our results provide a principled account of orientation-based contextual modulation in early vision and its sensitivity to the homogeneity and spatial arrangement of inputs, and lends statistical support to the theory that V1 computes visual salience. PMID:22396635
AST: Activity-Security-Trust driven modeling of time varying networks.
Wang, Jian; Xu, Jiake; Liu, Yanheng; Deng, Weiwen
2016-02-18
Network modeling is a flexible mathematical structure that enables to identify statistical regularities and structural principles hidden in complex systems. The majority of recent driving forces in modeling complex networks are originated from activity, in which an activity potential of a time invariant function is introduced to identify agents' interactions and to construct an activity-driven model. However, the new-emerging network evolutions are already deeply coupled with not only the explicit factors (e.g. activity) but also the implicit considerations (e.g. security and trust), so more intrinsic driving forces behind should be integrated into the modeling of time varying networks. The agents undoubtedly seek to build a time-dependent trade-off among activity, security, and trust in generating a new connection to another. Thus, we reasonably propose the Activity-Security-Trust (AST) driven model through synthetically considering the explicit and implicit driving forces (e.g. activity, security, and trust) underlying the decision process. AST-driven model facilitates to more accurately capture highly dynamical network behaviors and figure out the complex evolution process, allowing a profound understanding of the effects of security and trust in driving network evolution, and improving the biases induced by only involving activity representations in analyzing the dynamical processes.
Cocco, Simona; Leibler, Stanislas; Monasson, Rémi
2009-01-01
Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487
Hospital graduate social work field work programs: a study in New York City.
Showers, N
1990-02-01
Twenty-seven hospital field work programs in New York City were studied. Questionnaires were administered to program coordinators and 238 graduate social work students participating in study programs. High degrees of program structural complexity and variation were found, indicating a state of art well beyond that described in the general field work literature. High rates of student satisfaction with learning, field instructors, programs, and the overall field work experience found suggest that the complexity of study programs may be more effective than traditional field work models. Statistically nonsignificant study findings indicate areas in which hospital social work departments may develop field work programs consistent with shifting organizational needs, without undue risk to educational effectiveness. Statistically significant findings suggest areas in which inflexibility in program design may be more beneficial in the diagnostic related groups era.
Cox process representation and inference for stochastic reaction-diffusion processes
NASA Astrophysics Data System (ADS)
Schnoerr, David; Grima, Ramon; Sanguinetti, Guido
2016-05-01
Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction-diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reaction-diffusion process from data. Our solution relies on a non-trivial connection between stochastic reaction-diffusion processes and spatio-temporal Cox processes, a well-studied class of models from computational statistics. This connection leads to an efficient and flexible algorithm for parameter inference and model selection. Our approach shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. Our work provides both insights into spatio-temporal stochastic systems, and a practical solution to a long-standing problem in computational modelling.
An Overview of R in Health Decision Sciences.
Jalal, Hawre; Pechlivanoglou, Petros; Krijkamp, Eline; Alarid-Escudero, Fernando; Enns, Eva; Hunink, M G Myriam
2017-10-01
As the complexity of health decision science applications increases, high-level programming languages are increasingly adopted for statistical analyses and numerical computations. These programming languages facilitate sophisticated modeling, model documentation, and analysis reproducibility. Among the high-level programming languages, the statistical programming framework R is gaining increased recognition. R is freely available, cross-platform compatible, and open source. A large community of users who have generated an extensive collection of well-documented packages and functions supports it. These functions facilitate applications of health decision science methodology as well as the visualization and communication of results. Although R's popularity is increasing among health decision scientists, methodological extensions of R in the field of decision analysis remain isolated. The purpose of this article is to provide an overview of existing R functionality that is applicable to the various stages of decision analysis, including model design, input parameter estimation, and analysis of model outputs.
Probabilistic models for reactive behaviour in heterogeneous condensed phase media
NASA Astrophysics Data System (ADS)
Baer, M. R.; Gartling, D. K.; DesJardin, P. E.
2012-02-01
This work presents statistically-based models to describe reactive behaviour in heterogeneous energetic materials. Mesoscale effects are incorporated in continuum-level reactive flow descriptions using probability density functions (pdfs) that are associated with thermodynamic and mechanical states. A generalised approach is presented that includes multimaterial behaviour by treating the volume fraction as a random kinematic variable. Model simplifications are then sought to reduce the complexity of the description without compromising the statistical approach. Reactive behaviour is first considered for non-deformable media having a random temperature field as an initial state. A pdf transport relationship is derived and an approximate moment approach is incorporated in finite element analysis to model an example application whereby a heated fragment impacts a reactive heterogeneous material which leads to a delayed cook-off event. Modelling is then extended to include deformation effects associated with shock loading of a heterogeneous medium whereby random variables of strain, strain-rate and temperature are considered. A demonstrative mesoscale simulation of a non-ideal explosive is discussed that illustrates the joint statistical nature of the strain and temperature fields during shock loading to motivate the probabilistic approach. This modelling is derived in a Lagrangian framework that can be incorporated in continuum-level shock physics analysis. Future work will consider particle-based methods for a numerical implementation of this modelling approach.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.
Hoffman, Gabriel E
2013-01-01
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
NASA Astrophysics Data System (ADS)
Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino
2015-04-01
To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2013-08-01
Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
Li, Li; Paulo, Maria-João; van Eeuwijk, Fred
2010-01-01
Association mapping using DNA-based markers is a novel tool in plant genetics for the analysis of complex traits. Potato tuber yield, starch content, starch yield and chip color are complex traits of agronomic relevance, for which carbohydrate metabolism plays an important role. At the functional level, the genes and biochemical pathways involved in carbohydrate metabolism are among the best studied in plants. Quantitative traits such as tuber starch and sugar content are therefore models for association genetics in potato based on candidate genes. In an association mapping experiment conducted with a population of 243 tetraploid potato varieties and breeding clones, we previously identified associations between individual candidate gene alleles and tuber starch content, starch yield and chip quality. In the present paper, we tested 190 DNA markers at 36 loci scored in the same association mapping population for pairwise statistical epistatic interactions. Fifty marker pairs were associated mainly with tuber starch content and/or starch yield, at a cut-off value of q ≤ 0.20 for the experiment-wide false discovery rate (FDR). Thirteen marker pairs had an FDR of q ≤ 0.10. Alleles at loci encoding ribulose-bisphosphate carboxylase/oxygenase activase (Rca), sucrose phosphate synthase (Sps) and vacuolar invertase (Pain1) were most frequently involved in statistical epistatic interactions. The largest effect on tuber starch content and starch yield was observed for the paired alleles Pain1-8c and Rca-1a, explaining 9 and 10% of the total variance, respectively. The combination of these two alleles increased the means of tuber starch content and starch yield. Biological models to explain the observed statistical epistatic interactions are discussed. Electronic supplementary material The online version of this article (doi:10.1007/s00122-010-1389-3) contains supplementary material, which is available to authorized users. PMID:20603706
An outline of graphical Markov models in dentistry.
Helfenstein, U; Steiner, M; Menghini, G
1999-12-01
In the usual multiple regression model there is one response variable and one block of several explanatory variables. In contrast, in reality there may be a block of several possibly interacting response variables one would like to explain. In addition, the explanatory variables may split into a sequence of several blocks, each block containing several interacting variables. The variables in the second block are explained by those in the first block; the variables in the third block by those in the first and the second block etc. During recent years methods have been developed allowing analysis of problems where the data set has the above complex structure. The models involved are called graphical models or graphical Markov models. The main result of an analysis is a picture, a conditional independence graph with precise statistical meaning, consisting of circles representing variables and lines or arrows representing significant conditional associations. The absence of a line between two circles signifies that the corresponding two variables are independent conditional on the presence of other variables in the model. An example from epidemiology is presented in order to demonstrate application and use of the models. The data set in the example has a complex structure consisting of successive blocks: the variable in the first block is year of investigation; the variables in the second block are age and gender; the variables in the third block are indices of calculus, gingivitis and mutans streptococci and the final response variables in the fourth block are different indices of caries. Since the statistical methods may not be easily accessible to dentists, this article presents them in an introductory form. Graphical models may be of great value to dentists in allowing analysis and visualisation of complex structured multivariate data sets consisting of a sequence of blocks of interacting variables and, in particular, several possibly interacting responses in the final block.
Garbade, Sven F; Greenberg, Cheryl R; Demirkol, Mübeccel; Gökçay, Gülden; Ribes, Antonia; Campistol, Jaume; Burlina, Alberto B; Burgard, Peter; Kölker, Stefan
2014-09-01
Glutaric aciduria type I (GA-I) is a cerebral organic aciduria caused by inherited deficiency of glutaryl-CoA dehydrogenase and is characterized biochemically by an accumulation of putatively neurotoxic dicarboxylic metabolites. The majority of untreated patients develops a complex movement disorder with predominant dystonia during age 3-36 months. Magnetic resonance imaging (MRI) studies have demonstrated striatal and extrastriatal abnormalities. The major aim of this study was to elucidate the complex neuroradiological pattern of patients with GA-I and to associate the MRI findings with the severity of predominant neurological symptoms. In 180 patients, detailed information about the neurological presentation and brain region-specific MRI abnormalities were obtained via a standardized questionnaire. Patients with a movement disorder had more often MRI abnormalities in putamen, caudate, cortex, ventricles and external CSF spaces than patients without or with minor neurological symptoms. Putaminal MRI changes and strongly dilated ventricles were identified as the most reliable predictors of a movement disorder. In contrast, abnormalities in globus pallidus were not clearly associated with a movement disorder. Caudate and putamen as well as cortex, ventricles and external CSF spaces clearly collocalized on a two-dimensional map demonstrating statistical similarity and suggesting the same underlying pathomechanism. This study demonstrates that complex statistical methods are useful to decipher the age-dependent and region-specific MRI patterns of rare neurometabolic diseases and that these methods are helpful to elucidate the clinical relevance of specific MRI findings.
Generalised Central Limit Theorems for Growth Rate Distribution of Complex Systems
NASA Astrophysics Data System (ADS)
Takayasu, Misako; Watanabe, Hayafumi; Takayasu, Hideki
2014-04-01
We introduce a solvable model of randomly growing systems consisting of many independent subunits. Scaling relations and growth rate distributions in the limit of infinite subunits are analysed theoretically. Various types of scaling properties and distributions reported for growth rates of complex systems in a variety of fields can be derived from this basic physical model. Statistical data of growth rates for about 1 million business firms are analysed as a real-world example of randomly growing systems. Not only are the scaling relations consistent with the theoretical solution, but the entire functional form of the growth rate distribution is fitted with a theoretical distribution that has a power-law tail.
NASA Astrophysics Data System (ADS)
Kobayashi, Hiroaki; Gotoda, Hiroshi; Tachibana, Shigeru; Yoshida, Seiji
2017-12-01
We conduct an experimental study using time series analysis based on symbolic dynamics to detect a precursor of frequency-mode-shift during thermoacoustic combustion oscillations in a staged aircraft engine model combustor. With increasing amount of the main fuel, a significant shift in the dominant frequency-mode occurs in noisy periodic dynamics, leading to a notable increase in oscillation amplitudes. The sustainment of noisy periodic dynamics during thermoacoustic combustion oscillations is clearly shown by the multiscale complexity-entropy causality plane in terms of statistical complexity. A modified version of the permutation entropy allows us to detect a precursor of the frequency-mode-shift before the amplification of pressure fluctuations.
Characterizing bias correction uncertainty in wheat yield predictions
NASA Astrophysics Data System (ADS)
Ortiz, Andrea Monica; Jones, Julie; Freckleton, Robert; Scaife, Adam
2017-04-01
Farming systems are under increased pressure due to current and future climate change, variability and extremes. Research on the impacts of climate change on crop production typically rely on the output of complex Global and Regional Climate Models, which are used as input to crop impact models. Yield predictions from these top-down approaches can have high uncertainty for several reasons, including diverse model construction and parameterization, future emissions scenarios, and inherent or response uncertainty. These uncertainties propagate down each step of the 'cascade of uncertainty' that flows from climate input to impact predictions, leading to yield predictions that may be too complex for their intended use in practical adaptation options. In addition to uncertainty from impact models, uncertainty can also stem from the intermediate steps that are used in impact studies to adjust climate model simulations to become more realistic when compared to observations, or to correct the spatial or temporal resolution of climate simulations, which are often not directly applicable as input into impact models. These important steps of bias correction or calibration also add uncertainty to final yield predictions, given the various approaches that exist to correct climate model simulations. In order to address how much uncertainty the choice of bias correction method can add to yield predictions, we use several evaluation runs from Regional Climate Models from the Coordinated Regional Downscaling Experiment over Europe (EURO-CORDEX) at different resolutions together with different bias correction methods (linear and variance scaling, power transformation, quantile-quantile mapping) as input to a statistical crop model for wheat, a staple European food crop. The objective of our work is to compare the resulting simulation-driven hindcasted wheat yields to climate observation-driven wheat yield hindcasts from the UK and Germany in order to determine ranges of yield uncertainty that result from different climate model simulation input and bias correction methods. We simulate wheat yields using a General Linear Model that includes the effects of seasonal maximum temperatures and precipitation, since wheat is sensitive to heat stress during important developmental stages. We use the same statistical model to predict future wheat yields using the recently available bias-corrected simulations of EURO-CORDEX-Adjust. While statistical models are often criticized for their lack of complexity, an advantage is that we are here able to consider only the effect of the choice of climate model, resolution or bias correction method on yield. Initial results using both past and future bias-corrected climate simulations with a process-based model will also be presented. Through these methods, we make recommendations in preparing climate model output for crop models.
The Complexity of Primary Care Psychology: Theoretical Foundations.
Smit, E H; Derksen, J J L
2015-07-01
How does primary care psychology deal with organized complexity? Has it escaped Newtonian science? Has it, as Weaver (1991) suggests, found a way to 'manage problems with many interrelated factors that cannot be dealt by statistical techniques'? Computer simulations and mathematical models in psychology are ongoing positive developments in the study of complex systems. However, the theoretical development of complex systems in psychology lags behind these advances. In this article we use complexity science to develop a theory on experienced complexity in the daily practice of primary care psychologists. We briefly answer the ontological question of what we see (from the perspective of primary care psychology) as reality, the epistemological question of what we can know, the methodological question of how to act, and the ethical question of what is good care. Following our empirical study, we conclude that complexity science can describe the experienced complexity of the psychologist and offer room for personalized client-centered care. Complexity science is slowly filling the gap between the dominant reductionist theory and complex daily practice.
At Wake Forest U., Admissions Has Become "More Art than Science"
ERIC Educational Resources Information Center
Hoover, Eric
2009-01-01
The admissions process is awash in numbers. Students accumulate grade-point averages and test scores. Colleges use statistical models to predict enrollment outcomes, and they tout their place in commercial rankings. In many ways, numbers simplify this complex enterprise. However, they have come to carry undue weight, says Martha Blevins Allman,…
Conceptual Complexity and the Bias/Variance Tradeoff
ERIC Educational Resources Information Center
Briscoe, Erica; Feldman, Jacob
2011-01-01
In this paper we propose that the conventional dichotomy between exemplar-based and prototype-based models of concept learning is helpfully viewed as an instance of what is known in the statistical learning literature as the "bias/variance tradeoff". The bias/variance tradeoff can be thought of as a sliding scale that modulates how closely any…
The value of decision models: Using ecologically based invasive plant management as an example
USDA-ARS?s Scientific Manuscript database
Humans have both fast and slow thought processes which influence our judgment and decision-making. The fast system is intuitive and valuable for decisions which do not require multiple steps or the application of logic or statistics. However, many decisions in natural resources are complex and req...
Prettejohn, Brenton J.; Berryman, Matthew J.; McDonnell, Mark D.
2011-01-01
Many simulations of networks in computational neuroscience assume completely homogenous random networks of the Erdös–Rényi type, or regular networks, despite it being recognized for some time that anatomical brain networks are more complex in their connectivity and can, for example, exhibit the “scale-free” and “small-world” properties. We review the most well known algorithms for constructing networks with given non-homogeneous statistical properties and provide simple pseudo-code for reproducing such networks in software simulations. We also review some useful mathematical results and approximations associated with the statistics that describe these network models, including degree distribution, average path length, and clustering coefficient. We demonstrate how such results can be used as partial verification and validation of implementations. Finally, we discuss a sometimes overlooked modeling choice that can be crucially important for the properties of simulated networks: that of network directedness. The most well known network algorithms produce undirected networks, and we emphasize this point by highlighting how simple adaptations can instead produce directed networks. PMID:21441986
Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard
2008-04-25
With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way.
Statistical Mechanics of the US Supreme Court
NASA Astrophysics Data System (ADS)
Lee, Edward D.; Broedersz, Chase P.; Bialek, William
2015-07-01
We build simple models for the distribution of voting patterns in a group, using the Supreme Court of the United States as an example. The maximum entropy model consistent with the observed pairwise correlations among justices' votes, an Ising spin glass, agrees quantitatively with the data. While all correlations (perhaps surprisingly) are positive, the effective pairwise interactions in the spin glass model have both signs, recovering the intuition that ideologically opposite justices negatively influence each another. Despite the competing interactions, a strong tendency toward unanimity emerges from the model, organizing the voting patterns in a relatively simple "energy landscape." Besides unanimity, other energy minima in this landscape, or maxima in probability, correspond to prototypical voting states, such as the ideological split or a tightly correlated, conservative core. The model correctly predicts the correlation of justices with the majority and gives us a measure of their influence on the majority decision. These results suggest that simple models, grounded in statistical physics, can capture essential features of collective decision making quantitatively, even in a complex political context.
2012-01-01
discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI models such as, the...detection and discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI...Shubitidze of Sky Research and Dartmouth College, conceived, implemented , and tested most of the approaches presented in this report. He developed
ERIC Educational Resources Information Center
Steenbeek, Henderien; van Vondel, Sabine; van Geert, Paul
2017-01-01
This article concentrates on the question what kind of model--conceptual and statistical--can serve as a good working model for the study of learning and teaching processes qua processes. We claim that a good way of answering this question is to begin by observing a teaching and learning process as, where, and when it occurs. In addition, a…
Rescore protein-protein docked ensembles with an interface contact statistics.
Mezei, Mihaly
2017-02-01
The recently developed statistical measure for the type of residue-residue contact at protein complex interfaces, based on a parameter-free definition of contact, has been used to define a contact score that is correlated with the likelihood of correctness of a proposed complex structure. Comparing the proposed contact scores on the native structure and on a set of model structures the proposed measure was shown to generally favor the native structure but in itself was not able to reliably score the native structure to be the best. Adjusting the scores of redocking experiments with the contact score showed that the adjusted score was able to move up the ranking of the native-like structure among the proposed complexes when the native-like was not ranked the best by the respective program. Tests on docking of unbound proteins compared the contact scores of the complexes with the contact score of the crystal structure again showing the tendency of the contact score to favor native-like conformations. The possibility of using the contact score to improve the determination of biological dimers in a crystal structure was also explored. Proteins 2017; 85:235-241. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Husbands, Aman Y; Aggarwal, Vasudha; Ha, Taekjip; Timmermans, Marja C P
2016-08-01
Deciphering complex biological processes markedly benefits from approaches that directly assess the underlying biomolecular interactions. Most commonly used approaches to monitor protein-protein interactions typically provide nonquantitative readouts that lack statistical power and do not yield information on the heterogeneity or stoichiometry of protein complexes. Single-molecule pull-down (SiMPull) uses single-molecule fluorescence detection to mitigate these disadvantages and can quantitatively interrogate interactions between proteins and other compounds, such as nucleic acids, small molecule ligands, and lipids. Here, we establish SiMPull in plants using the HOMEODOMAIN LEUCINE ZIPPER III (HD-ZIPIII) and LITTLE ZIPPER (ZPR) interaction as proof-of-principle. Colocalization analysis of fluorophore-tagged HD-ZIPIII and ZPR proteins provides strong statistical evidence of complex formation. In addition, we use SiMPull to directly quantify YFP and mCherry maturation probabilities, showing these differ substantially from values obtained in mammalian systems. Leveraging these probabilities, in conjunction with fluorophore photobleaching assays on over 2000 individual complexes, we determined HD-ZIPIII:ZPR stoichiometry. Intriguingly, these complexes appear as heterotetramers, comprising two HD-ZIPIII and two ZPR molecules, rather than heterodimers as described in the current model. This surprising result raises new questions about the regulation of these key developmental factors and is illustrative of the unique contribution SiMPull is poised to make to in planta protein interaction studies. © 2016 American Society of Plant Biologists. All rights reserved.
Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems
NASA Astrophysics Data System (ADS)
Koch, Patrick Nathan
Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.
Wang, Danny J J; Jann, Kay; Fan, Chang; Qiao, Yang; Zang, Yu-Feng; Lu, Hanbing; Yang, Yihong
2018-01-01
Recently, non-linear statistical measures such as multi-scale entropy (MSE) have been introduced as indices of the complexity of electrophysiology and fMRI time-series across multiple time scales. In this work, we investigated the neurophysiological underpinnings of complexity (MSE) of electrophysiology and fMRI signals and their relations to functional connectivity (FC). MSE and FC analyses were performed on simulated data using neural mass model based brain network model with the Brain Dynamics Toolbox, on animal models with concurrent recording of fMRI and electrophysiology in conjunction with pharmacological manipulations, and on resting-state fMRI data from the Human Connectome Project. Our results show that the complexity of regional electrophysiology and fMRI signals is positively correlated with network FC. The associations between MSE and FC are dependent on the temporal scales or frequencies, with higher associations between MSE and FC at lower temporal frequencies. Our results from theoretical modeling, animal experiment and human fMRI indicate that (1) Regional neural complexity and network FC may be two related aspects of brain's information processing: the more complex regional neural activity, the higher FC this region has with other brain regions; (2) MSE at high and low frequencies may represent local and distributed information processing across brain regions. Based on literature and our data, we propose that the complexity of regional neural signals may serve as an index of the brain's capacity of information processing-increased complexity may indicate greater transition or exploration between different states of brain networks, thereby a greater propensity for information processing.
Stochastic modeling of sunshine number data
NASA Astrophysics Data System (ADS)
Brabec, Marek; Paulescu, Marius; Badescu, Viorel
2013-11-01
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.
Stochastic modeling of sunshine number data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel
2013-11-13
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less
An Integrative Account of Constraints on Cross-Situational Learning
Yurovsky, Daniel; Frank, Michael C.
2015-01-01
Word-object co-occurrence statistics are a powerful information source for vocabulary learning, but there is considerable debate about how learners actually use them. While some theories hold that learners accumulate graded, statistical evidence about multiple referents for each word, others suggest that they track only a single candidate referent. In two large-scale experiments, we show that neither account is sufficient: Cross-situational learning involves elements of both. Further, the empirical data are captured by a computational model that formalizes how memory and attention interact with co-occurrence tracking. Together, the data and model unify opposing positions in a complex debate and underscore the value of understanding the interaction between computational and algorithmic levels of explanation. PMID:26302052
Current algebra, statistical mechanics and quantum models
NASA Astrophysics Data System (ADS)
Vilela Mendes, R.
2017-11-01
Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.
Performance of Ultra Wideband On-Body Communication Based on Statistical Channel Model
NASA Astrophysics Data System (ADS)
Wang, Qiong; Wang, Jianqing
Ultra wideband (UWB) on-body communication is attracting much attention in biomedical applications. In this paper, the performance of UWB on-body communication is investigated based on a statistically extracted on-body channel model, which provides detailed characteristics of the multi-path-affected channel with an emphasis on various body postures or body movement. The possible data rate, the possible communication distance, as well as the bit error rate (BER) performance are clarified via computer simulation. It is found that the conventional correlation receiver is incompetent in the multi-path-affected on-body channel, while the RAKE receiver outperforms the conventional correlation receiver at a cost of structure complexity. Different RAKE receiver structures are compared to show the improvement of the BER performance.
Role of spatial inhomogenity in GPCR dimerisation predicted by receptor association-diffusion models
NASA Astrophysics Data System (ADS)
Deshpande, Sneha A.; Pawar, Aiswarya B.; Dighe, Anish; Athale, Chaitanya A.; Sengupta, Durba
2017-06-01
G protein-coupled receptor (GPCR) association is an emerging paradigm with far reaching implications in the regulation of signalling pathways and therapeutic interventions. Recent super resolution microscopy studies have revealed that receptor dimer steady state exhibits sub-second dynamics. In particular the GPCRs, muscarinic acetylcholine receptor M1 (M1MR) and formyl peptide receptor (FPR), have been demonstrated to exhibit a fast association/dissociation kinetics, independent of ligand binding. In this work, we have developed a spatial kinetic Monte Carlo model to investigate receptor homo-dimerisation at a single receptor resolution. Experimentally measured association/dissociation kinetic parameters and diffusion coefficients were used as inputs to the model. To test the effect of membrane spatial heterogeneity on the simulated steady state, simulations were compared to experimental statistics of dimerisation. In the simplest case the receptors are assumed to be diffusing in a spatially homogeneous environment, while spatial heterogeneity is modelled to result from crowding, membrane micro-domains and cytoskeletal compartmentalisation or ‘corrals’. We show that a simple association-diffusion model is sufficient to reproduce M1MR association statistics, but fails to reproduce FPR statistics despite comparable kinetic constants. A parameter sensitivity analysis is required to reproduce the association statistics of FPR. The model reveals the complex interplay between cytoskeletal components and their influence on receptor association kinetics within the features of the membrane landscape. These results constitute an important step towards understanding the factors modulating GPCR organisation.
Levine, Rebecca S; Peterson, A Townsend; Benedict, Mark Q
2004-02-01
The distribution of the Anopheles gambiae complex of malaria vectors in Africa is uncertain due to under-sampling of vast regions. We use ecologic niche modeling to predict the potential distribution of three members of the complex (A. gambiae, A. arabiensis, and A. quadriannulatus) and demonstrate the statistical significance of the models. Predictions correspond well to previous estimates, but provide detail regarding spatial discontinuities in the distribution of A. gambiae s.s. that are consistent with population genetic studies. Our predictions also identify large areas of Africa where the presence of A. arabiensis is predicted, but few specimens have been obtained, suggesting under-sampling of the species. Finally, we project models developed from African distribution data for the late 1900s into the past and to South America to determine retrospectively whether the deadly 1929 introduction of A. gambiae sensu lato into Brazil was more likely that of A. gambiae sensu stricto or A. arabiensis.
A statistical approach to quasi-extinction forecasting.
Holmes, Elizabeth Eli; Sabo, John L; Viscido, Steven Vincent; Fagan, William Fredric
2007-12-01
Forecasting population decline to a certain critical threshold (the quasi-extinction risk) is one of the central objectives of population viability analysis (PVA), and such predictions figure prominently in the decisions of major conservation organizations. In this paper, we argue that accurate forecasting of a population's quasi-extinction risk does not necessarily require knowledge of the underlying biological mechanisms. Because of the stochastic and multiplicative nature of population growth, the ensemble behaviour of population trajectories converges to common statistical forms across a wide variety of stochastic population processes. This paper provides a theoretical basis for this argument. We show that the quasi-extinction surfaces of a variety of complex stochastic population processes (including age-structured, density-dependent and spatially structured populations) can be modelled by a simple stochastic approximation: the stochastic exponential growth process overlaid with Gaussian errors. Using simulated and real data, we show that this model can be estimated with 20-30 years of data and can provide relatively unbiased quasi-extinction risk with confidence intervals considerably smaller than (0,1). This was found to be true even for simulated data derived from some of the noisiest population processes (density-dependent feedback, species interactions and strong age-structure cycling). A key advantage of statistical models is that their parameters and the uncertainty of those parameters can be estimated from time series data using standard statistical methods. In contrast for most species of conservation concern, biologically realistic models must often be specified rather than estimated because of the limited data available for all the various parameters. Biologically realistic models will always have a prominent place in PVA for evaluating specific management options which affect a single segment of a population, a single demographic rate, or different geographic areas. However, for forecasting quasi-extinction risk, statistical models that are based on the convergent statistical properties of population processes offer many advantages over biologically realistic models.
State estimation and prediction using clustered particle filters.
Lee, Yoonsang; Majda, Andrew J
2016-12-20
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
NASA Astrophysics Data System (ADS)
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
State estimation and prediction using clustered particle filters
Lee, Yoonsang; Majda, Andrew J.
2016-01-01
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P
2015-05-01
The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
Rajeswaran, Jeevanantham; Blackstone, Eugene H
2017-02-01
In medical sciences, we often encounter longitudinal temporal relationships that are non-linear in nature. The influence of risk factors may also change across longitudinal follow-up. A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase. Application of this model is illustrated using spirometry data after lung transplantation using readily available statistical software. This application illustrates the usefulness of our flexible model when dealing with complex non-linear patterns and time-varying coefficients.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Microscale Obstacle Resolving Air Quality Model Evaluation with the Michelstadt Case
Rakai, Anikó; Kristóf, Gergely
2013-01-01
Modelling pollutant dispersion in cities is challenging for air quality models as the urban obstacles have an important effect on the flow field and thus the dispersion. Computational Fluid Dynamics (CFD) models with an additional scalar dispersion transport equation are a possible way to resolve the flowfield in the urban canopy and model dispersion taking into consideration the effect of the buildings explicitly. These models need detailed evaluation with the method of verification and validation to gain confidence in their reliability and use them as a regulatory purpose tool in complex urban geometries. This paper shows the performance of an open source general purpose CFD code, OpenFOAM for a complex urban geometry, Michelstadt, which has both flow field and dispersion measurement data. Continuous release dispersion results are discussed to show the strengths and weaknesses of the modelling approach, focusing on the value of the turbulent Schmidt number, which was found to give best statistical metric results with a value of 0.7. PMID:24027450
Microscale obstacle resolving air quality model evaluation with the Michelstadt case.
Rakai, Anikó; Kristóf, Gergely
2013-01-01
Modelling pollutant dispersion in cities is challenging for air quality models as the urban obstacles have an important effect on the flow field and thus the dispersion. Computational Fluid Dynamics (CFD) models with an additional scalar dispersion transport equation are a possible way to resolve the flowfield in the urban canopy and model dispersion taking into consideration the effect of the buildings explicitly. These models need detailed evaluation with the method of verification and validation to gain confidence in their reliability and use them as a regulatory purpose tool in complex urban geometries. This paper shows the performance of an open source general purpose CFD code, OpenFOAM for a complex urban geometry, Michelstadt, which has both flow field and dispersion measurement data. Continuous release dispersion results are discussed to show the strengths and weaknesses of the modelling approach, focusing on the value of the turbulent Schmidt number, which was found to give best statistical metric results with a value of 0.7.
Teshima, Tara Lynn; Patel, Vaibhav; Mainprize, James G; Edwards, Glenn; Antonyshyn, Oleh M
2015-07-01
The utilization of three-dimensional modeling technology in craniomaxillofacial surgery has grown exponentially during the last decade. Future development, however, is hindered by the lack of a normative three-dimensional anatomic dataset and a statistical mean three-dimensional virtual model. The purpose of this study is to develop and validate a protocol to generate a statistical three-dimensional virtual model based on a normative dataset of adult skulls. Two hundred adult skull CT images were reviewed. The average three-dimensional skull was computed by processing each CT image in the series using thin-plate spline geometric morphometric protocol. Our statistical average three-dimensional skull was validated by reconstructing patient-specific topography in cranial defects. The experiment was repeated 4 times. In each case, computer-generated cranioplasties were compared directly to the original intact skull. The errors describing the difference between the prediction and the original were calculated. A normative database of 33 adult human skulls was collected. Using 21 anthropometric landmark points, a protocol for three-dimensional skull landmarking and data reduction was developed and a statistical average three-dimensional skull was generated. Our results show the root mean square error (RMSE) for restoration of a known defect using the native best match skull, our statistical average skull, and worst match skull was 0.58, 0.74, and 4.4 mm, respectively. The ability to statistically average craniofacial surface topography will be a valuable instrument for deriving missing anatomy in complex craniofacial defects and deficiencies as well as in evaluating morphologic results of surgery.
An integrated system for rainfall induced shallow landslides modeling
NASA Astrophysics Data System (ADS)
Formetta, Giuseppe; Capparelli, Giovanna; Rigon, Riccardo; Versace, Pasquale
2014-05-01
Rainfall induced shallow landslides (RISL) cause significant damages involving loss of life and properties. Predict susceptible locations for RISL is a complex task that involves many disciplines: hydrology, geotechnical science, geomorphology, statistic. Usually to accomplish this task two main approaches are used: statistical or physically based model. In this work an open source (OS), 3-D, fully distributed hydrological model was integrated in an OS modeling framework (Object Modeling System). The chain is closed by linking the system to a component for safety factor computation with infinite slope approximation able to take into account layered soils and suction contribution to hillslope stability. The model composition was tested for a case study in Calabria (Italy) in order to simulate the triggering of a landslide happened in the Cosenza Province. The integration in OMS allows the use of other components such as a GIS to manage inputs-output processes, and automatic calibration algorithms to estimate model parameters. Finally, model performances were quantified by comparing modelled and simulated trigger time. This research is supported by Ambito/Settore AMBIENTE E SICUREZZA (PON01_01503) project.
NASA Astrophysics Data System (ADS)
Allen, J. Icarus; Holt, Jason T.; Blackford, Jerry; Proctor, Roger
2007-12-01
Marine systems models are becoming increasingly complex and sophisticated, but far too little attention has been paid to model errors and the extent to which model outputs actually relate to ecosystem processes. Here we describe the application of summary error statistics to a complex 3D model (POLCOMS-ERSEM) run for the period 1988-1989 in the southern North Sea utilising information from the North Sea Project, which collected a wealth of observational data. We demonstrate that to understand model data misfit and the mechanisms creating errors, we need to use a hierarchy of techniques, including simple correlations, model bias, model efficiency, binary discriminator analysis and the distribution of model errors to assess model errors spatially and temporally. We also demonstrate that a linear cost function is an inappropriate measure of misfit. This analysis indicates that the model has some skill for all variables analysed. A summary plot of model performance indicates that model performance deteriorates as we move through the ecosystem from the physics, to the nutrients and plankton.
Hart, Corey B.; Giszter, Simon F.
2013-01-01
We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341
NASA Astrophysics Data System (ADS)
Kwiatkowski, L.; Yool, A.; Allen, J. I.; Anderson, T. R.; Barciela, R.; Buitenhuis, E. T.; Butenschön, M.; Enright, C.; Halloran, P. R.; Le Quéré, C.; de Mora, L.; Racault, M.-F.; Sinha, B.; Totterdell, I. J.; Cox, P. M.
2014-07-01
Ocean biogeochemistry (OBGC) models span a wide range of complexities from highly simplified, nutrient-restoring schemes, through nutrient-phytoplankton-zooplankton-detritus (NPZD) models that crudely represent the marine biota, through to models that represent a broader trophic structure by grouping organisms as plankton functional types (PFT) based on their biogeochemical role (Dynamic Green Ocean Models; DGOM) and ecosystem models which group organisms by ecological function and trait. OBGC models are now integral components of Earth System Models (ESMs), but they compete for computing resources with higher resolution dynamical setups and with other components such as atmospheric chemistry and terrestrial vegetation schemes. As such, the choice of OBGC in ESMs needs to balance model complexity and realism alongside relative computing cost. Here, we present an inter-comparison of six OBGC models that were candidates for implementation within the next UK Earth System Model (UKESM1). The models cover a large range of biological complexity (from 7 to 57 tracers) but all include representations of at least the nitrogen, carbon, alkalinity and oxygen cycles. Each OBGC model was coupled to the Nucleus for the European Modelling of the Ocean (NEMO) ocean general circulation model (GCM), and results from physically identical hindcast simulations were compared. Model skill was evaluated for biogeochemical metrics of global-scale bulk properties using conventional statistical techniques. The computing cost of each model was also measured in standardised tests run at two resource levels. No model is shown to consistently outperform or underperform all other models across all metrics. Nonetheless, the simpler models that are easier to tune are broadly closer to observations across a number of fields, and thus offer a high-efficiency option for ESMs that prioritise high resolution climate dynamics. However, simpler models provide limited insight into more complex marine biogeochemical processes and ecosystem pathways, and a parallel approach of low resolution climate dynamics and high complexity biogeochemistry is desirable in order to provide additional insights into biogeochemistry-climate interactions.
NASA Astrophysics Data System (ADS)
Kwiatkowski, L.; Yool, A.; Allen, J. I.; Anderson, T. R.; Barciela, R.; Buitenhuis, E. T.; Butenschön, M.; Enright, C.; Halloran, P. R.; Le Quéré, C.; de Mora, L.; Racault, M.-F.; Sinha, B.; Totterdell, I. J.; Cox, P. M.
2014-12-01
Ocean biogeochemistry (OBGC) models span a wide variety of complexities, including highly simplified nutrient-restoring schemes, nutrient-phytoplankton-zooplankton-detritus (NPZD) models that crudely represent the marine biota, models that represent a broader trophic structure by grouping organisms as plankton functional types (PFTs) based on their biogeochemical role (dynamic green ocean models) and ecosystem models that group organisms by ecological function and trait. OBGC models are now integral components of Earth system models (ESMs), but they compete for computing resources with higher resolution dynamical setups and with other components such as atmospheric chemistry and terrestrial vegetation schemes. As such, the choice of OBGC in ESMs needs to balance model complexity and realism alongside relative computing cost. Here we present an intercomparison of six OBGC models that were candidates for implementation within the next UK Earth system model (UKESM1). The models cover a large range of biological complexity (from 7 to 57 tracers) but all include representations of at least the nitrogen, carbon, alkalinity and oxygen cycles. Each OBGC model was coupled to the ocean general circulation model Nucleus for European Modelling of the Ocean (NEMO) and results from physically identical hindcast simulations were compared. Model skill was evaluated for biogeochemical metrics of global-scale bulk properties using conventional statistical techniques. The computing cost of each model was also measured in standardised tests run at two resource levels. No model is shown to consistently outperform all other models across all metrics. Nonetheless, the simpler models are broadly closer to observations across a number of fields and thus offer a high-efficiency option for ESMs that prioritise high-resolution climate dynamics. However, simpler models provide limited insight into more complex marine biogeochemical processes and ecosystem pathways, and a parallel approach of low-resolution climate dynamics and high-complexity biogeochemistry is desirable in order to provide additional insights into biogeochemistry-climate interactions.
Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision
Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson
2014-01-01
The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339
Chadeau-Hyam, Marc; Campanella, Gianluca; Jombart, Thibaut; Bottolo, Leonardo; Portengen, Lutzen; Vineis, Paolo; Liquet, Benoit; Vermeulen, Roel C H
2013-08-01
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets. Copyright © 2013 Wiley Periodicals, Inc.
Huffman and linear scanning methods with statistical language models.
Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris
2015-03-01
Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.
Information driving force and its application in agent-based modeling
NASA Astrophysics Data System (ADS)
Chen, Ting-Ting; Zheng, Bo; Li, Yan; Jiang, Xiong-Fei
2018-04-01
Exploring the scientific impact of online big-data has attracted much attention of researchers from different fields in recent years. Complex financial systems are typical open systems profoundly influenced by the external information. Based on the large-scale data in the public media and stock markets, we first define an information driving force, and analyze how it affects the complex financial system. The information driving force is observed to be asymmetric in the bull and bear market states. As an application, we then propose an agent-based model driven by the information driving force. Especially, all the key parameters are determined from the empirical analysis rather than from statistical fitting of the simulation results. With our model, both the stationary properties and non-stationary dynamic behaviors are simulated. Considering the mean-field effect of the external information, we also propose a few-body model to simulate the financial market in the laboratory.
NASA Astrophysics Data System (ADS)
Boulton, Chris A.; Allison, Lesley C.; Lenton, Timothy M.
2014-12-01
The Atlantic Meridional Overturning Circulation (AMOC) exhibits two stable states in models of varying complexity. Shifts between alternative AMOC states are thought to have played a role in past abrupt climate changes, but the proximity of the climate system to a threshold for future AMOC collapse is unknown. Generic early warning signals of critical slowing down before AMOC collapse have been found in climate models of low and intermediate complexity. Here we show that early warning signals of AMOC collapse are present in a fully coupled atmosphere-ocean general circulation model, subject to a freshwater hosing experiment. The statistical significance of signals of increasing lag-1 autocorrelation and variance vary with latitude. They give up to 250 years warning before AMOC collapse, after ~550 years of monitoring. Future work is needed to clarify suggested dynamical mechanisms driving critical slowing down as the AMOC collapse is approached.
Boulton, Chris A.; Allison, Lesley C.; Lenton, Timothy M.
2014-01-01
The Atlantic Meridional Overturning Circulation (AMOC) exhibits two stable states in models of varying complexity. Shifts between alternative AMOC states are thought to have played a role in past abrupt climate changes, but the proximity of the climate system to a threshold for future AMOC collapse is unknown. Generic early warning signals of critical slowing down before AMOC collapse have been found in climate models of low and intermediate complexity. Here we show that early warning signals of AMOC collapse are present in a fully coupled atmosphere-ocean general circulation model, subject to a freshwater hosing experiment. The statistical significance of signals of increasing lag-1 autocorrelation and variance vary with latitude. They give up to 250 years warning before AMOC collapse, after ~550 years of monitoring. Future work is needed to clarify suggested dynamical mechanisms driving critical slowing down as the AMOC collapse is approached. PMID:25482065
Boulton, Chris A; Allison, Lesley C; Lenton, Timothy M
2014-12-08
The Atlantic Meridional Overturning Circulation (AMOC) exhibits two stable states in models of varying complexity. Shifts between alternative AMOC states are thought to have played a role in past abrupt climate changes, but the proximity of the climate system to a threshold for future AMOC collapse is unknown. Generic early warning signals of critical slowing down before AMOC collapse have been found in climate models of low and intermediate complexity. Here we show that early warning signals of AMOC collapse are present in a fully coupled atmosphere-ocean general circulation model, subject to a freshwater hosing experiment. The statistical significance of signals of increasing lag-1 autocorrelation and variance vary with latitude. They give up to 250 years warning before AMOC collapse, after ~550 years of monitoring. Future work is needed to clarify suggested dynamical mechanisms driving critical slowing down as the AMOC collapse is approached.
Remontet, Laurent; Uhry, Zoé; Bossard, Nadine; Iwaz, Jean; Belot, Aurélien; Danieli, Coraline; Charvat, Hadrien; Roche, Laurent
2018-01-01
Cancer survival trend analyses are essential to describe accurately the way medical practices impact patients' survival according to the year of diagnosis. To this end, survival models should be able to account simultaneously for non-linear and non-proportional effects and for complex interactions between continuous variables. However, in the statistical literature, there is no consensus yet on how to build such models that should be flexible but still provide smooth estimates of survival. In this article, we tackle this challenge by smoothing the complex hypersurface (time since diagnosis, age at diagnosis, year of diagnosis, and mortality hazard) using a multidimensional penalized spline built from the tensor product of the marginal bases of time, age, and year. Considering this penalized survival model as a Poisson model, we assess the performance of this approach in estimating the net survival with a comprehensive simulation study that reflects simple and complex realistic survival trends. The bias was generally small and the root mean squared error was good and often similar to that of the true model that generated the data. This parametric approach offers many advantages and interesting prospects (such as forecasting) that make it an attractive and efficient tool for survival trend analyses.
Statistical use of argonaute expression and RISC assembly in microRNA target identification.
Stanhope, Stephen A; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A
2009-09-01
MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies.
NASA Technical Reports Server (NTRS)
He, Yuning
2015-01-01
The behavior of complex aerospace systems is governed by numerous parameters. For safety analysis it is important to understand how the system behaves with respect to these parameter values. In particular, understanding the boundaries between safe and unsafe regions is of major importance. In this paper, we describe a hierarchical Bayesian statistical modeling approach for the online detection and characterization of such boundaries. Our method for classification with active learning uses a particle filter-based model and a boundary-aware metric for best performance. From a library of candidate shapes incorporated with domain expert knowledge, the location and parameters of the boundaries are estimated using advanced Bayesian modeling techniques. The results of our boundary analysis are then provided in a form understandable by the domain expert. We illustrate our approach using a simulation model of a NASA neuro-adaptive flight control system, as well as a system for the detection of separation violations in the terminal airspace.
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
NASA Astrophysics Data System (ADS)
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2018-07-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
NASA Astrophysics Data System (ADS)
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2017-11-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
Statistical methods used in articles published by the Journal of Periodontal and Implant Science.
Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young
2014-12-01
The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.
Selecting the "Best" Factor Structure and Moving Measurement Validation Forward: An Illustration.
Schmitt, Thomas A; Sass, Daniel A; Chappelle, Wayne; Thompson, William
2018-04-09
Despite the broad literature base on factor analysis best practices, research seeking to evaluate a measure's psychometric properties frequently fails to consider or follow these recommendations. This leads to incorrect factor structures, numerous and often overly complex competing factor models and, perhaps most harmful, biased model results. Our goal is to demonstrate a practical and actionable process for factor analysis through (a) an overview of six statistical and psychometric issues and approaches to be aware of, investigate, and report when engaging in factor structure validation, along with a flowchart for recommended procedures to understand latent factor structures; (b) demonstrating these issues to provide a summary of the updated Posttraumatic Stress Disorder Checklist (PCL-5) factor models and a rationale for validation; and (c) conducting a comprehensive statistical and psychometric validation of the PCL-5 factor structure to demonstrate all the issues we described earlier. Considering previous research, the PCL-5 was evaluated using a sample of 1,403 U.S. Air Force remotely piloted aircraft operators with high levels of battlefield exposure. Previously proposed PCL-5 factor structures were not supported by the data, but instead a bifactor model is arguably more statistically appropriate.
Edla, Shwetha; Kovvali, Narayan; Papandreou-Suppappola, Antonia
2012-01-01
Constructing statistical models of electrocardiogram (ECG) signals, whose parameters can be used for automated disease classification, is of great importance in precluding manual annotation and providing prompt diagnosis of cardiac diseases. ECG signals consist of several segments with different morphologies (namely the P wave, QRS complex and the T wave) in a single heart beat, which can vary across individuals and diseases. Also, existing statistical ECG models exhibit a reliance upon obtaining a priori information from the ECG data by using preprocessing algorithms to initialize the filter parameters, or to define the user-specified model parameters. In this paper, we propose an ECG modeling technique using the sequential Markov chain Monte Carlo (SMCMC) filter that can perform simultaneous model selection, by adaptively choosing from different representations depending upon the nature of the data. Our results demonstrate the ability of the algorithm to track various types of ECG morphologies, including intermittently occurring ECG beats. In addition, we use the estimated model parameters as the feature set to classify between ECG signals with normal sinus rhythm and four different types of arrhythmia.
Statistical complexity without explicit reference to underlying probabilities
NASA Astrophysics Data System (ADS)
Pennini, F.; Plastino, A.
2018-06-01
We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.
Universality classes of fluctuation dynamics in hierarchical complex systems
NASA Astrophysics Data System (ADS)
Macêdo, A. M. S.; González, Iván R. Roa; Salazar, D. S. P.; Vasconcelos, G. L.
2017-03-01
A unified approach is proposed to describe the statistics of the short-time dynamics of multiscale complex systems. The probability density function of the relevant time series (signal) is represented as a statistical superposition of a large time-scale distribution weighted by the distribution of certain internal variables that characterize the slowly changing background. The dynamics of the background is formulated as a hierarchical stochastic model whose form is derived from simple physical constraints, which in turn restrict the dynamics to only two possible classes. The probability distributions of both the signal and the background have simple representations in terms of Meijer G functions. The two universality classes for the background dynamics manifest themselves in the signal distribution as two types of tails: power law and stretched exponential, respectively. A detailed analysis of empirical data from classical turbulence and financial markets shows excellent agreement with the theory.
Evolutionary dynamics of group interactions on structured populations: a review
Perc, Matjaž; Gómez-Gardeñes, Jesús; Szolnoki, Attila; Floría, Luis M.; Moreno, Yamir
2013-01-01
Interactions among living organisms, from bacteria colonies to human societies, are inherently more complex than interactions among particles and non-living matter. Group interactions are a particularly important and widespread class, representative of which is the public goods game. In addition, methods of statistical physics have proved valuable for studying pattern formation, equilibrium selection and self-organization in evolutionary games. Here, we review recent advances in the study of evolutionary dynamics of group interactions on top of structured populations, including lattices, complex networks and coevolutionary models. We also compare these results with those obtained on well-mixed populations. The review particularly highlights that the study of the dynamics of group interactions, like several other important equilibrium and non-equilibrium dynamical processes in biological, economical and social sciences, benefits from the synergy between statistical physics, network science and evolutionary game theory. PMID:23303223
The new challenges of multiplex networks: Measures and models
NASA Astrophysics Data System (ADS)
Battiston, Federico; Nicosia, Vincenzo; Latora, Vito
2017-02-01
What do societies, the Internet, and the human brain have in common? They are all examples of complex relational systems, whose emerging behaviours are largely determined by the non-trivial networks of interactions among their constituents, namely individuals, computers, or neurons, rather than only by the properties of the units themselves. In the last two decades, network scientists have proposed models of increasing complexity to better understand real-world systems. Only recently we have realised that multiplexity, i.e. the coexistence of several types of interactions among the constituents of a complex system, is responsible for substantial qualitative and quantitative differences in the type and variety of behaviours that a complex system can exhibit. As a consequence, multilayer and multiplex networks have become a hot topic in complexity science. Here we provide an overview of some of the measures proposed so far to characterise the structure of multiplex networks, and a selection of models aiming at reproducing those structural properties and quantifying their statistical significance. Focusing on a subset of relevant topics, this brief review is a quite comprehensive introduction to the most basic tools for the analysis of multiplex networks observed in the real-world. The wide applicability of multiplex networks as a framework to model complex systems in different fields, from biology to social sciences, and the colloquial tone of the paper will make it an interesting read for researchers working on both theoretical and experimental analysis of networked systems.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
A multi-factor Rasch scale for artistic judgment.
Bezruczko, Nikolaus
2002-01-01
Measurement properties are reported for a combined scale of abstract and figurative artistic judgment aptitude items. Abstract items are synthetic, rule-based images from Visual Designs Test which implements a statistical algorithm to control design complexity and redundancy, and figurative items are canvas paintings in five styles, Fauvism, Post-Impressionism, Surrealism, Renaissance, and Baroque especially created for this research. The paintings integrate syntactic structure from VDT Abstract designs with thematic content for each style at four levels of complexity while controlling redundancy. Trained test administrators collected preference for synthetic abstract designs and authentic figurative art from 462 examinees in Johnson O'Connor Research Foundation testing offices in Boston, New York, Chicago, and Dallas. The Rasch model replicated measurement properties for VDT Abstract items and identified an item hierarchy that was statistically invariant between genders and generally stable across age for new, authentic figurative items. Further examination of the figurative item hierarchy revealed that complexity interacts with style and meaning. Sound measurement properties for a combined VDT Abstract and Figurative scale shows promise for a comprehensive artistic judgment construct.
Modelling gene expression profiles related to prostate tumor progression using binary states
2013-01-01
Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350
NASA Technical Reports Server (NTRS)
Joyner, James J., Sr.
2014-01-01
Develop Survivability vs Time Model as a decision-evaluation tool to assess various emergency egress methods used at Launch Complex 39B (LC 39B) and in the Vehicle Assembly Building (VAB) on NASAs Kennedy Space Center. For each hazard scenario, develop probability distributions to address statistical uncertainty resulting in survivability plots over time and composite survivability plots encompassing multiple hazard scenarios.
Weakly anomalous diffusion with non-Gaussian propagators
NASA Astrophysics Data System (ADS)
Cressoni, J. C.; Viswanathan, G. M.; Ferreira, A. S.; da Silva, M. A. A.
2012-08-01
A poorly understood phenomenon seen in complex systems is diffusion characterized by Hurst exponent H≈1/2 but with non-Gaussian statistics. Motivated by such empirical findings, we report an exact analytical solution for a non-Markovian random walk model that gives rise to weakly anomalous diffusion with H=1/2 but with a non-Gaussian propagator.
Adult Literacy. Cuyahoga County Data Brief
ERIC Educational Resources Information Center
Center on Urban Poverty and Community Development (NJ1), 2010
2010-01-01
There are no direct measures of adult literacy in Cuyahoga County. Instead, this report uses estimates based on a statistical model derived from the National Survey of Adult Literacy. Adult literacy levels range from Level 1 (the most basic) to Level 5 (the most complex). People with Level 1 literacy are at a severe disadvantage in the sense that…
ERIC Educational Resources Information Center
Stapleton, Lee M.; Garrod, Guy D.
2007-01-01
Using a range of statistical criteria rooted in Information Theory we show that there is little justification for relaxing the equal weights assumption underlying the United Nation's Human Development Index (HDI) even if the true HDI diverges significantly from this assumption. Put differently, the additional model complexity that unequal weights…
D. McKenzie; C.L. Raymond; L.-K.B. Kellogg; R.A. Norheim; A.G. Andreu; A.C. Bayard; K.E. Kopper; E. Elman
2007-01-01
Fuel mapping is a complex and often multidisciplinary process, involving remote sensing, ground-based validation, statistical modeling, and knowledge-based systems. The scale and resolution of fuel mapping depend both on objectives and availability of spatial data layers. We demonstrate use of the Fuel Characteristic Classification System (FCCS) for fuel mapping at two...
Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.
2016-01-01
Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614
Raja, Muhammad Asif Zahoor; Kiani, Adiqa Kausar; Shehzad, Azam; Zameer, Aneela
2016-01-01
In this study, bio-inspired computing is exploited for solving system of nonlinear equations using variants of genetic algorithms (GAs) as a tool for global search method hybrid with sequential quadratic programming (SQP) for efficient local search. The fitness function is constructed by defining the error function for systems of nonlinear equations in mean square sense. The design parameters of mathematical models are trained by exploiting the competency of GAs and refinement are carried out by viable SQP algorithm. Twelve versions of the memetic approach GA-SQP are designed by taking a different set of reproduction routines in the optimization process. Performance of proposed variants is evaluated on six numerical problems comprising of system of nonlinear equations arising in the interval arithmetic benchmark model, kinematics, neurophysiology, combustion and chemical equilibrium. Comparative studies of the proposed results in terms of accuracy, convergence and complexity are performed with the help of statistical performance indices to establish the worth of the schemes. Accuracy and convergence of the memetic computing GA-SQP is found better in each case of the simulation study and effectiveness of the scheme is further established through results of statistics based on different performance indices for accuracy and complexity.
Balancing model complexity and measurements in hydrology
NASA Astrophysics Data System (ADS)
Van De Giesen, N.; Schoups, G.; Weijs, S. V.
2012-12-01
The Data Processing Inequality implies that hydrological modeling can only reduce, and never increase, the amount of information available in the original data used to formulate and calibrate hydrological models: I(X;Z(Y)) ≤ I(X;Y). Still, hydrologists around the world seem quite content building models for "their" watersheds to move our discipline forward. Hydrological models tend to have a hybrid character with respect to underlying physics. Most models make use of some well established physical principles, such as mass and energy balances. One could argue that such principles are based on many observations, and therefore add data. These physical principles, however, are applied to hydrological models that often contain concepts that have no direct counterpart in the observable physical universe, such as "buckets" or "reservoirs" that fill up and empty out over time. These not-so-physical concepts are more like the Artificial Neural Networks and Support Vector Machines of the Artificial Intelligence (AI) community. Within AI, one quickly came to the realization that by increasing model complexity, one could basically fit any dataset but that complexity should be controlled in order to be able to predict unseen events. The more data are available to train or calibrate the model, the more complex it can be. Many complexity control approaches exist in AI, with Solomonoff inductive inference being one of the first formal approaches, the Akaike Information Criterion the most popular, and Statistical Learning Theory arguably being the most comprehensive practical approach. In hydrology, complexity control has hardly been used so far. There are a number of reasons for that lack of interest, the more valid ones of which will be presented during the presentation. For starters, there are no readily available complexity measures for our models. Second, some unrealistic simplifications of the underlying complex physics tend to have a smoothing effect on possible model outcomes, thereby preventing the most obvious results of over-fitting. Thirdly, dependence within and between time series poses an additional analytical problem. Finally, there are arguments to be made that the often discussed "equifinality" in hydrological models is simply a different manifestation of the lack of complexity control. In turn, this points toward a general idea, which is actually quite popular in sciences other than hydrology, that additional data gathering is a good way to increase the information content of our descriptions of hydrological reality.
van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W
2016-10-01
Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.
Chen, Qing; Xu, Pengfei; Liu, Wenzhong
2016-01-01
Computer vision as a fast, low-cost, noncontact, and online monitoring technology has been an important tool to inspect product quality, particularly on a large-scale assembly production line. However, the current industrial vision system is far from satisfactory in the intelligent perception of complex grain images, comprising a large number of local homogeneous fragmentations or patches without distinct foreground and background. We attempt to solve this problem based on the statistical modeling of spatial structures of grain images. We present a physical explanation in advance to indicate that the spatial structures of the complex grain images are subject to a representative Weibull distribution according to the theory of sequential fragmentation, which is well known in the continued comminution of ore grinding. To delineate the spatial structure of the grain image, we present a method of multiscale and omnidirectional Gaussian derivative filtering. Then, a product quality classifier based on sparse multikernel–least squares support vector machine is proposed to solve the low-confidence classification problem of imbalanced data distribution. The proposed method is applied on the assembly line of a food-processing enterprise to classify (or identify) automatically the production quality of rice. The experiments on the real application case, compared with the commonly used methods, illustrate the validity of our method. PMID:26986726
Analyzing complex networks evolution through Information Theory quantifiers
NASA Astrophysics Data System (ADS)
Carpi, Laura C.; Rosso, Osvaldo A.; Saco, Patricia M.; Ravetti, Martín Gómez
2011-01-01
A methodology to analyze dynamical changes in complex networks based on Information Theory quantifiers is proposed. The square root of the Jensen-Shannon divergence, a measure of dissimilarity between two probability distributions, and the MPR Statistical Complexity are used to quantify states in the network evolution process. Three cases are analyzed, the Watts-Strogatz model, a gene network during the progression of Alzheimer's disease and a climate network for the Tropical Pacific region to study the El Niño/Southern Oscillation (ENSO) dynamic. We find that the proposed quantifiers are able not only to capture changes in the dynamics of the processes but also to quantify and compare states in their evolution.
Predictive uncertainty in auditory sequence processing
Hansen, Niels Chr.; Pearce, Marcus T.
2014-01-01
Previous studies of auditory expectation have focused on the expectedness perceived by listeners retrospectively in response to events. In contrast, this research examines predictive uncertainty—a property of listeners' prospective state of expectation prior to the onset of an event. We examine the information-theoretic concept of Shannon entropy as a model of predictive uncertainty in music cognition. This is motivated by the Statistical Learning Hypothesis, which proposes that schematic expectations reflect probabilistic relationships between sensory events learned implicitly through exposure. Using probability estimates from an unsupervised, variable-order Markov model, 12 melodic contexts high in entropy and 12 melodic contexts low in entropy were selected from two musical repertoires differing in structural complexity (simple and complex). Musicians and non-musicians listened to the stimuli and provided explicit judgments of perceived uncertainty (explicit uncertainty). We also examined an indirect measure of uncertainty computed as the entropy of expectedness distributions obtained using a classical probe-tone paradigm where listeners rated the perceived expectedness of the final note in a melodic sequence (inferred uncertainty). Finally, we simulate listeners' perception of expectedness and uncertainty using computational models of auditory expectation. A detailed model comparison indicates which model parameters maximize fit to the data and how they compare to existing models in the literature. The results show that listeners experience greater uncertainty in high-entropy musical contexts than low-entropy contexts. This effect is particularly apparent for inferred uncertainty and is stronger in musicians than non-musicians. Consistent with the Statistical Learning Hypothesis, the results suggest that increased domain-relevant training is associated with an increasingly accurate cognitive model of probabilistic structure in music. PMID:25295018
Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard
2008-01-01
Background With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Methods Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. Conclusions As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way. PMID:18460173
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Maximizing information exchange between complex networks
NASA Astrophysics Data System (ADS)
West, Bruce J.; Geneston, Elvis L.; Grigolini, Paolo
2008-10-01
Science is not merely the smooth progressive interaction of hypothesis, experiment and theory, although it sometimes has that form. More realistically the scientific study of any given complex phenomenon generates a number of explanations, from a variety of perspectives, that eventually requires synthesis to achieve a deep level of insight and understanding. One such synthesis has created the field of out-of-equilibrium statistical physics as applied to the understanding of complex dynamic networks. Over the past forty years the concept of complexity has undergone a metamorphosis. Complexity was originally seen as a consequence of memory in individual particle trajectories, in full agreement with a Hamiltonian picture of microscopic dynamics and, in principle, macroscopic dynamics could be derived from the microscopic Hamiltonian picture. The main difficulty in deriving macroscopic dynamics from microscopic dynamics is the need to take into account the actions of a very large number of components. The existence of events such as abrupt jumps, considered by the conventional continuous time random walk approach to describing complexity was never perceived as conflicting with the Hamiltonian view. Herein we review many of the reasons why this traditional Hamiltonian view of complexity is unsatisfactory. We show that as a result of technological advances, which make the observation of single elementary events possible, the definition of complexity has shifted from the conventional memory concept towards the action of non-Poisson renewal events. We show that the observation of crucial processes, such as the intermittent fluorescence of blinking quantum dots as well as the brain’s response to music, as monitored by a set of electrodes attached to the scalp, has forced investigators to go beyond the traditional concept of complexity and to establish closer contact with the nascent field of complex networks. Complex networks form one of the most challenging areas of modern research overarching all of the traditional scientific disciplines. The transportation networks of planes, highways and railroads; the economic networks of global finance and stock markets; the social networks of terrorism, governments, businesses and churches; the physical networks of telephones, the Internet, earthquakes and global warming and the biological networks of gene regulation, the human body, clusters of neurons and food webs, share a number of apparently universal properties as the networks become increasingly complex. Ubiquitous aspects of such complex networks are the appearance of non-stationary and non-ergodic statistical processes and inverse power-law statistical distributions. Herein we review the traditional dynamical and phase-space methods for modeling such networks as their complexity increases and focus on the limitations of these procedures in explaining complex networks. Of course we will not be able to review the entire nascent field of network science, so we limit ourselves to a review of how certain complexity barriers have been surmounted using newly applied theoretical concepts such as aging, renewal, non-ergodic statistics and the fractional calculus. One emphasis of this review is information transport between complex networks, which requires a fundamental change in perception that we express as a transition from the familiar stochastic resonance to the new concept of complexity matching.
Short-term Wind Forecasting at Wind Farms using WRF-LES and Actuator Disk Model
NASA Astrophysics Data System (ADS)
Kirkil, Gokhan
2017-04-01
Short-term wind forecasts are obtained for a wind farm on a mountainous terrain using WRF-LES. Multi-scale simulations are also performed using different PBL parameterizations. Turbines are parameterized using Actuator Disc Model. LES models improved the forecasts. Statistical error analysis is performed and ramp events are analyzed. Complex topography of the study area affects model performance, especially the accuracy of wind forecasts were poor for cross valley-mountain flows. By means of LES, we gain new knowledge about the sources of spatial and temporal variability of wind fluctuations such as the configuration of wind turbines.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Estimating uncertainties in complex joint inverse problems
NASA Astrophysics Data System (ADS)
Afonso, Juan Carlos
2016-04-01
Sources of uncertainty affecting geophysical inversions can be classified either as reflective (i.e. the practitioner is aware of her/his ignorance) or non-reflective (i.e. the practitioner does not know that she/he does not know!). Although we should be always conscious of the latter, the former are the ones that, in principle, can be estimated either empirically (by making measurements or collecting data) or subjectively (based on the experience of the researchers). For complex parameter estimation problems in geophysics, subjective estimation of uncertainty is the most common type. In this context, probabilistic (aka Bayesian) methods are commonly claimed to offer a natural and realistic platform from which to estimate model uncertainties. This is because in the Bayesian approach, errors (whatever their nature) can be naturally included as part of the global statistical model, the solution of which represents the actual solution to the inverse problem. However, although we agree that probabilistic inversion methods are the most powerful tool for uncertainty estimation, the common claim that they produce "realistic" or "representative" uncertainties is not always justified. Typically, ALL UNCERTAINTY ESTIMATES ARE MODEL DEPENDENT, and therefore, besides a thorough characterization of experimental uncertainties, particular care must be paid to the uncertainty arising from model errors and input uncertainties. We recall here two quotes by G. Box and M. Gunzburger, respectively, of special significance for inversion practitioners and for this session: "…all models are wrong, but some are useful" and "computational results are believed by no one, except the person who wrote the code". In this presentation I will discuss and present examples of some problems associated with the estimation and quantification of uncertainties in complex multi-observable probabilistic inversions, and how to address them. Although the emphasis will be on sources of uncertainty related to the forward and statistical models, I will also address other uncertainties associated with data and uncertainty propagation.
Statistical Modeling of Robotic Random Walks on Different Terrain
NASA Astrophysics Data System (ADS)
Naylor, Austin; Kinnaman, Laura
Issues of public safety, especially with crowd dynamics and pedestrian movement, have been modeled by physicists using methods from statistical mechanics over the last few years. Complex decision making of humans moving on different terrains can be modeled using random walks (RW) and correlated random walks (CRW). The effect of different terrains, such as a constant increasing slope, on RW and CRW was explored. LEGO robots were programmed to make RW and CRW with uniform step sizes. Level ground tests demonstrated that the robots had the expected step size distribution and correlation angles (for CRW). The mean square displacement was calculated for each RW and CRW on different terrains and matched expected trends. The step size distribution was determined to change based on the terrain; theoretical predictions for the step size distribution were made for various simple terrains. It's Dr. Laura Kinnaman, not sure where to put the Prefix.
A Backscatter-Lidar Forward-Operator
NASA Astrophysics Data System (ADS)
Geisinger, Armin; Behrendt, Andreas; Wulfmeyer, Volker; Vogel, Bernhard; Mattis, Ina; Flentje, Harald; Förstner, Jochen; Potthast, Roland
2015-04-01
We have developed a forward-operator which is capable of calculating virtual lidar profiles from atmospheric state simulations. The operator allows us to compare lidar measurements and model simulations based on the same measurement parameter: the lidar backscatter profile. This method simplifies qualitative comparisons and also makes quantitative comparisons possible, including statistical error quantification. Implemented into an aerosol-capable model system, the operator will act as a component to assimilate backscatter-lidar measurements. As many weather services maintain already networks of backscatter-lidars, such data are acquired already in an operational manner. To estimate and quantify errors due to missing or uncertain aerosol information, we started sensitivity studies about several scattering parameters such as the aerosol size and both the real and imaginary part of the complex index of refraction. Furthermore, quantitative and statistical comparisons between measurements and virtual measurements are shown in this study, i.e. applying the backscatter-lidar forward-operator on model output.
Complex patterns of abnormal heartbeats
NASA Technical Reports Server (NTRS)
Schulte-Frohlinde, Verena; Ashkenazy, Yosef; Goldberger, Ary L.; Ivanov, Plamen Ch; Costa, Madalena; Morley-Davies, Adrian; Stanley, H. Eugene; Glass, Leon
2002-01-01
Individuals having frequent abnormal heartbeats interspersed with normal heartbeats may be at an increased risk of sudden cardiac death. However, mechanistic understanding of such cardiac arrhythmias is limited. We present a visual and qualitative method to display statistical properties of abnormal heartbeats. We introduce dynamical "heartprints" which reveal characteristic patterns in long clinical records encompassing approximately 10(5) heartbeats and may provide information about underlying mechanisms. We test if these dynamics can be reproduced by model simulations in which abnormal heartbeats are generated (i) randomly, (ii) at a fixed time interval following a preceding normal heartbeat, or (iii) by an independent oscillator that may or may not interact with the normal heartbeat. We compare the results of these three models and test their limitations to comprehensively simulate the statistical features of selected clinical records. This work introduces methods that can be used to test mathematical models of arrhythmogenesis and to develop a new understanding of underlying electrophysiologic mechanisms of cardiac arrhythmia.
Analyzing Single-Molecule Protein Transportation Experiments via Hierarchical Hidden Markov Models
Chen, Yang; Shen, Kuang
2017-01-01
To maintain proper cellular functions, over 50% of proteins encoded in the genome need to be transported to cellular membranes. The molecular mechanism behind such a process, often referred to as protein targeting, is not well understood. Single-molecule experiments are designed to unveil the detailed mechanisms and reveal the functions of different molecular machineries involved in the process. The experimental data consist of hundreds of stochastic time traces from the fluorescence recordings of the experimental system. We introduce a Bayesian hierarchical model on top of hidden Markov models (HMMs) to analyze these data and use the statistical results to answer the biological questions. In addition to resolving the biological puzzles and delineating the regulating roles of different molecular complexes, our statistical results enable us to propose a more detailed mechanism for the late stages of the protein targeting process. PMID:28943680
The distribution of density in supersonic turbulence
NASA Astrophysics Data System (ADS)
Squire, Jonathan; Hopkins, Philip F.
2017-11-01
We propose a model for the statistics of the mass density in supersonic turbulence, which plays a crucial role in star formation and the physics of the interstellar medium (ISM). The model is derived by considering the density to be arranged as a collection of strong shocks of width ˜ M^{-2}, where M is the turbulent Mach number. With two physically motivated parameters, the model predicts all density statistics for M>1 turbulence: the density probability distribution and its intermittency (deviation from lognormality), the density variance-Mach number relation, power spectra and structure functions. For the proposed model parameters, reasonable agreement is seen between model predictions and numerical simulations, albeit within the large uncertainties associated with current simulation results. More generally, the model could provide a useful framework for more detailed analysis of future simulations and observational data. Due to the simple physical motivations for the model in terms of shocks, it is straightforward to generalize to more complex physical processes, which will be helpful in future more detailed applications to the ISM. We see good qualitative agreement between such extensions and recent simulations of non-isothermal turbulence.
NASA Astrophysics Data System (ADS)
Goldsworthy, M. J.
2012-10-01
One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.
Grozdanov, Daniel; Herascu, Nicoleta; Reinot, Tõnu; Jankowiak, Ryszard; Zazubovich, Valter
2010-03-18
Previously published and new spectral hole burning (SHB) data on the B800 band of LH2 light-harvesting antenna complex of Rps. acidophila are analyzed in light of recent single photosynthetic complex spectroscopy (SPCS) results (for a review, see Berlin et al. Phys. Life Rev. 2007, 4, 64.). It is demonstrated that, in general, SHB-related phenomena observed for the B800 band are in qualitative agreement with the SPCS data and the protein models involving multiwell multitier protein energy landscapes. Regarding the quantitative agreement, we argue that the single-molecule behavior associated with the fastest spectral diffusion (smallest barrier) tier of the protein energy landscape is inconsistent with the SHB data. The latter discrepancy can be attributed to SPCS probing not only the dynamics of of the protein complex per se, but also that of the surrounding amorphous host and/or of the host-protein interface. It is argued that SHB (once improved models are developed) should also be able to provide the average magnitudes and probability distributions of light-induced spectral shifts and could be used to determine whether SPCS probes a set of protein complexes that are both intact and statistically relevant. SHB results are consistent with the B800 --> B850 energy-transfer models including consideration of the whole B850 density of states.
On the streaming model for redshift-space distortions
NASA Astrophysics Data System (ADS)
Kuruvilla, Joseph; Porciani, Cristiano
2018-06-01
The streaming model describes the mapping between real and redshift space for 2-point clustering statistics. Its key element is the probability density function (PDF) of line-of-sight pairwise peculiar velocities. Following a kinetic-theory approach, we derive the fundamental equations of the streaming model for ordered and unordered pairs. In the first case, we recover the classic equation while we demonstrate that modifications are necessary for unordered pairs. We then discuss several statistical properties of the pairwise velocities for DM particles and haloes by using a suite of high-resolution N-body simulations. We test the often used Gaussian ansatz for the PDF of pairwise velocities and discuss its limitations. Finally, we introduce a mixture of Gaussians which is known in statistics as the generalised hyperbolic distribution and show that it provides an accurate fit to the PDF. Once inserted in the streaming equation, the fit yields an excellent description of redshift-space correlations at all scales that vastly outperforms the Gaussian and exponential approximations. Using a principal-component analysis, we reduce the complexity of our model for large redshift-space separations. Our results increase the robustness of studies of anisotropic galaxy clustering and are useful for extending them towards smaller scales in order to test theories of gravity and interacting dark-energy models.
Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework.
Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana
2014-06-01
Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd.
Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework†
Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana
2014-01-01
Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd. PMID:25505370
Johnson, Douglas H.; Cook, R.D.
2013-01-01
In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.
Bayesian Model Selection under Time Constraints
NASA Astrophysics Data System (ADS)
Hoege, M.; Nowak, W.; Illman, W. A.
2017-12-01
Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
ERIC Educational Resources Information Center
Smith, Rachel A.; Levine, Timothy R.; Lachlan, Kenneth A.; Fediuk, Thomas A.
2002-01-01
Notes that the availability of statistical software packages has led to a sharp increase in use of complex research designs and complex statistical analyses in communication research. Reports a series of Monte Carlo simulations which demonstrate that this complexity may come at a heavier cost than many communication researchers realize. Warns…
2013-01-01
Background Efficacy of withaferin A (WA), an Ayurvedic medicine constituent, for prevention of mammary cancer and its associated mechanisms were investigated using mouse mammary tumor virus–neu (MMTV-neu) transgenic model. Methods Incidence and burden of mammary cancer and pulmonary metastasis were scored in female MMTV-neu mice after 28 weeks of intraperitoneal administration with 100 µg WA (three times/week) (n = 32) or vehicle (n = 29). Mechanisms underlying mammary cancer prevention by WA were investigated by determination of tumor cell proliferation, apoptosis, metabolomics, and proteomics using plasma and/or tumor tissues. Spectrophotometric assays were performed to determine activities of complex III and complex IV. All statistical tests were two-sided. Results WA administration resulted in a statistically significant decrease in macroscopic mammary tumor size, microscopic mammary tumor area, and the incidence of pulmonary metastasis. For example, the mean area of invasive cancer was lower by 95.14% in the WA treatment group compared with the control group (mean = 3.10 vs 63.77mm2, respectively; difference = –60.67mm2; 95% confidence interval = –122.50 to 1.13mm2; P = .0536). Mammary cancer prevention by WA treatment was associated with increased apoptosis, inhibition of complex III activity, and reduced levels of glycolysis intermediates. Proteomics confirmed downregulation of many glycolysis-related proteins in the tumor of WA-treated mice compared with control, including M2-type pyruvate kinase, phospho glycerate kinase, and fructose-bisphosphate aldolase A isoform 2. Conclusions This study reveals suppression of glycolysis in WA-mediated mammary cancer prevention in a clinically relevant mouse model. PMID:23821767
Teleosts as Model Organisms To Understand Host-Microbe Interactions.
Lescak, Emily A; Milligan-Myhre, Kathryn C
2017-08-01
Host-microbe interactions are influenced by complex host genetics and environment. Studies across animal taxa have aided our understanding of how intestinal microbiota influence vertebrate development, disease, and physiology. However, traditional mammalian studies can be limited by the use of isogenic strains, husbandry constraints that result in small sample sizes and limited statistical power, reliance on indirect characterization of gut microbial communities from fecal samples, and concerns of whether observations in artificial conditions are actually reflective of what occurs in the wild. Fish models are able to overcome many of these limitations. The extensive variation in the physiology, ecology, and natural history of fish enriches studies of the evolution and ecology of host-microbe interactions. They share physiological and immunological features common among vertebrates, including humans, and harbor complex gut microbiota, which allows identification of the mechanisms driving microbial community assembly. Their accelerated life cycles and large clutch sizes and the ease of sampling both internal and external microbial communities make them particularly well suited for robust statistical studies of microbial diversity. Gnotobiotic techniques, genetic manipulation of the microbiota and host, and transparent juveniles enable novel insights into mechanisms underlying development of the digestive tract and disease states. Many diseases involve a complex combination of genes which are difficult to manipulate in homogeneous model organisms. By taking advantage of the natural genetic variation found in wild fish populations, as well as of the availability of powerful genetic tools, future studies should be able to identify conserved genes and pathways that contribute to human genetic diseases characterized by dysbiosis. Copyright © 2017 Lescak and Milligan-Myhre.
Teleosts as Model Organisms To Understand Host-Microbe Interactions
2017-01-01
ABSTRACT Host-microbe interactions are influenced by complex host genetics and environment. Studies across animal taxa have aided our understanding of how intestinal microbiota influence vertebrate development, disease, and physiology. However, traditional mammalian studies can be limited by the use of isogenic strains, husbandry constraints that result in small sample sizes and limited statistical power, reliance on indirect characterization of gut microbial communities from fecal samples, and concerns of whether observations in artificial conditions are actually reflective of what occurs in the wild. Fish models are able to overcome many of these limitations. The extensive variation in the physiology, ecology, and natural history of fish enriches studies of the evolution and ecology of host-microbe interactions. They share physiological and immunological features common among vertebrates, including humans, and harbor complex gut microbiota, which allows identification of the mechanisms driving microbial community assembly. Their accelerated life cycles and large clutch sizes and the ease of sampling both internal and external microbial communities make them particularly well suited for robust statistical studies of microbial diversity. Gnotobiotic techniques, genetic manipulation of the microbiota and host, and transparent juveniles enable novel insights into mechanisms underlying development of the digestive tract and disease states. Many diseases involve a complex combination of genes which are difficult to manipulate in homogeneous model organisms. By taking advantage of the natural genetic variation found in wild fish populations, as well as of the availability of powerful genetic tools, future studies should be able to identify conserved genes and pathways that contribute to human genetic diseases characterized by dysbiosis. PMID:28439034
Determination of effective loss factors in reduced SEA models
NASA Astrophysics Data System (ADS)
Chimeno Manguán, M.; Fernández de las Heras, M. J.; Roibás Millán, E.; Simón Hidalgo, F.
2017-01-01
The definition of Statistical Energy Analysis (SEA) models for large complex structures is highly conditioned by the classification of the structure elements into a set of coupled subsystems and the subsequent determination of the loss factors representing both the internal damping and the coupling between subsystems. The accurate definition of the complete system can lead to excessively large models as the size and complexity increases. This fact can also rise practical issues for the experimental determination of the loss factors. This work presents a formulation of reduced SEA models for incomplete systems defined by a set of effective loss factors. This reduced SEA model provides a feasible number of subsystems for the application of the Power Injection Method (PIM). For structures of high complexity, their components accessibility can be restricted, for instance internal equipments or panels. For these cases the use of PIM to carry out an experimental SEA analysis is not possible. New methods are presented for this case in combination with the reduced SEA models. These methods allow defining some of the model loss factors that could not be obtained through PIM. The methods are validated with a numerical analysis case and they are also applied to an actual spacecraft structure with accessibility restrictions: a solar wing in folded configuration.
AST: Activity-Security-Trust driven modeling of time varying networks
Wang, Jian; Xu, Jiake; Liu, Yanheng; Deng, Weiwen
2016-01-01
Network modeling is a flexible mathematical structure that enables to identify statistical regularities and structural principles hidden in complex systems. The majority of recent driving forces in modeling complex networks are originated from activity, in which an activity potential of a time invariant function is introduced to identify agents’ interactions and to construct an activity-driven model. However, the new-emerging network evolutions are already deeply coupled with not only the explicit factors (e.g. activity) but also the implicit considerations (e.g. security and trust), so more intrinsic driving forces behind should be integrated into the modeling of time varying networks. The agents undoubtedly seek to build a time-dependent trade-off among activity, security, and trust in generating a new connection to another. Thus, we reasonably propose the Activity-Security-Trust (AST) driven model through synthetically considering the explicit and implicit driving forces (e.g. activity, security, and trust) underlying the decision process. AST-driven model facilitates to more accurately capture highly dynamical network behaviors and figure out the complex evolution process, allowing a profound understanding of the effects of security and trust in driving network evolution, and improving the biases induced by only involving activity representations in analyzing the dynamical processes. PMID:26888717
Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles
NASA Astrophysics Data System (ADS)
Mini, C.; Hogue, T. S.; Pincetl, S.
2012-04-01
Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A genetic algorithm based model (Shuffled Complex Evolution-UA; SCE-UA) is also being developed to provide estimates of the predictions and parameters uncertainties and to compare against the linear regression models. Ultimately, models will be selected to undertake predictions for a range of climate change and landscape scenarios. Finally, project results will contribute to a better understanding of water demand to help predict future water use and implement targeted landscaping conservation programs to maintain sustainable water needs for a growing population under uncertain climate variability.
Observing Consistency in Online Communication Patterns for User Re-Identification
Venter, Hein S.
2016-01-01
Comprehension of the statistical and structural mechanisms governing human dynamics in online interaction plays a pivotal role in online user identification, online profile development, and recommender systems. However, building a characteristic model of human dynamics on the Internet involves a complete analysis of the variations in human activity patterns, which is a complex process. This complexity is inherent in human dynamics and has not been extensively studied to reveal the structural composition of human behavior. A typical method of anatomizing such a complex system is viewing all independent interconnectivity that constitutes the complexity. An examination of the various dimensions of human communication pattern in online interactions is presented in this paper. The study employed reliable server-side web data from 31 known users to explore characteristics of human-driven communications. Various machine-learning techniques were explored. The results revealed that each individual exhibited a relatively consistent, unique behavioral signature and that the logistic regression model and model tree can be used to accurately distinguish online users. These results are applicable to one-to-one online user identification processes, insider misuse investigation processes, and online profiling in various areas. PMID:27918593
Efficient computation of the joint sample frequency spectra for multiple populations.
Kamm, John A; Terhorst, Jonathan; Song, Yun S
2017-01-01
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
Efficient computation of the joint sample frequency spectra for multiple populations
Kamm, John A.; Terhorst, Jonathan; Song, Yun S.
2016-01-01
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity. PMID:28239248
Testing for independence in J×K contingency tables with complex sample survey data.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C
2015-09-01
The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.
Seeking parsimony in hydrology and water resources technology
NASA Astrophysics Data System (ADS)
Koutsoyiannis, D.
2009-04-01
The principle of parsimony, also known as the principle of simplicity, the principle of economy and Ockham's razor, advises scientists to prefer the simplest theory among those that fit the data equally well. In this, it is an epistemic principle but reflects an ontological characterization that the universe is ultimately parsimonious. Is this principle useful and can it really be reconciled with, and implemented to, our modelling approaches of complex hydrological systems, whose elements and events are extraordinarily numerous, different and unique? The answer underlying the mainstream hydrological research of the last two decades seems to be negative. Hopes were invested to the power of computers that would enable faithful and detailed representation of the diverse system elements and the hydrological processes, based on merely "first principles" and resulting in "physically-based" models that tend to approach in complexity the real world systems. Today the account of such research endeavour seems not positive, as it did not improve model predictive capacity and processes comprehension. A return to parsimonious modelling seems to be again the promising route. The experience from recent research and from comparisons of parsimonious and complicated models indicates that the former can facilitate insight and comprehension, improve accuracy and predictive capacity, and increase efficiency. In addition - and despite aspiration that "physically based" models will have lower data requirements and, even, they ultimately become "data-free" - parsimonious models require fewer data to achieve the same accuracy with more complicated models. Naturally, the concepts that reconcile the simplicity of parsimonious models with the complexity of hydrological systems are probability theory and statistics. Probability theory provides the theoretical basis for moving from a microscopic to a macroscopic view of phenomena, by mapping sets of diverse elements and events of hydrological systems to single numbers (a probability or an expected value), and statistics provides the empirical basis of summarizing data, making inference from them, and supporting decision making in water resource management. Unfortunately, the current state of the art in probability, statistics and their union, often called stochastics, is not fully satisfactory for the needs of modelling of hydrological and water resource systems. A first problem is that stochastic modelling has traditionally relied on classical statistics, which is based on the independent "coin-tossing" prototype, rather than on the study of real-world systems whose behaviour is very different from the classical prototype. A second problem is that the stochastic models (particularly the multivariate ones) are often not parsimonious themselves. Therefore, substantial advancement of stochastics is necessary in a new paradigm of parsimonious hydrological modelling. These ideas are illustrated using several examples, namely: (a) hydrological modelling of a karst system in Bosnia and Herzegovina using three different approaches ranging from parsimonious to detailed "physically-based"; (b) parsimonious modelling of a peculiar modified catchment in Greece; (c) a stochastic approach that can replace parameter-excessive ARMA-type models with a generalized algorithm that produces any shape of autocorrelation function (consistent with the accuracy provided by the data) using a couple of parameters; (d) a multivariate stochastic approach which replaces a huge number of parameters estimated from data with coefficients estimated by the principle of maximum entropy; and (e) a parsimonious approach for decision making in multi-reservoir systems using a handful of parameters instead of thousands of decision variables.
A Flexible Approach for the Statistical Visualization of Ensemble Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potter, K.; Wilson, A.; Bremer, P.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome
NASA Astrophysics Data System (ADS)
Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.
2018-03-01
The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.
NASA Astrophysics Data System (ADS)
Shafii, M.; Tolson, B.; Matott, L. S.
2012-04-01
Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Coupled disease-behavior dynamics on complex networks: A review
NASA Astrophysics Data System (ADS)
Wang, Zhen; Andrews, Michael A.; Wu, Zhi-Xi; Wang, Lin; Bauch, Chris T.
2015-12-01
It is increasingly recognized that a key component of successful infection control efforts is understanding the complex, two-way interaction between disease dynamics and human behavioral and social dynamics. Human behavior such as contact precautions and social distancing clearly influence disease prevalence, but disease prevalence can in turn alter human behavior, forming a coupled, nonlinear system. Moreover, in many cases, the spatial structure of the population cannot be ignored, such that social and behavioral processes and/or transmission of infection must be represented with complex networks. Research on studying coupled disease-behavior dynamics in complex networks in particular is growing rapidly, and frequently makes use of analysis methods and concepts from statistical physics. Here, we review some of the growing literature in this area. We contrast network-based approaches to homogeneous-mixing approaches, point out how their predictions differ, and describe the rich and often surprising behavior of disease-behavior dynamics on complex networks, and compare them to processes in statistical physics. We discuss how these models can capture the dynamics that characterize many real-world scenarios, thereby suggesting ways that policy makers can better design effective prevention strategies. We also describe the growing sources of digital data that are facilitating research in this area. Finally, we suggest pitfalls which might be faced by researchers in the field, and we suggest several ways in which the field could move forward in the coming years.
Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Long-term occlusal changes assessed by the American Board of Orthodontics' model grading system.
Aszkler, Robert M; Preston, Charles B; Saltaji, Humam; Tabbaa, Sawsan
2014-02-01
The purpose of this study was to assess the long-term posttreatment changes in all criteria of the American Board of Orthodontics' (ABO) model grading system. We used plaster models from patients' final and posttreatment records. Thirty patients treated by 1 orthodontist using 1 bracket prescription were selected. An initial discrepancy index for each subject was performed to determine the complexity of each case. The final models were then graded using the ABO's model grading system immediately at posttreatment and postretention. Statistical analysis was performed on the 8 criteria of the model grading system, including paired t tests and Pearson correlations. An alpha of 0.05 was considered statistically significant. The average length of time between the posttreatment and postretention records was 12.7 ± 4.4 years. It was shown that alignment and rotations worsened by postretention (P = 0.014), and a weak statistically significant correlation at posttreatment and postretention was found (0.44; P = 0.016). Both marginal ridges and occlusal contacts scored less well at posttreatment. These criteria showed a significant decrease in scores between posttreatment and postretention (P <0.001), but the correlations were not statistically significant. The average total score showed a significant decrease between posttreatment and postretention (P <0.001), partly because of the large decrease in the previous 2 criteria. Higher scores for occlusal contacts and marginal ridges were found at the end of treatment; however, those scores and the overall scores for the 30 subjects improved in the postretention phase. Copyright © 2014. Published by Mosby, Inc.