Statistical Physics of High Dimensional Inference
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
To model modern large-scale datasets, we need efficient algorithms to infer a set of P unknown model parameters from N noisy measurements. What are fundamental limits on the accuracy of parameter inference, given limited measurements, signal-to-noise ratios, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =N/P --> ∞ . However, modern high-dimensional inference problems, in fields ranging from bio-informatics to economics, occur at finite α. We formulate and analyze high-dimensional inference analytically by applying the replica and cavity methods of statistical physics where data serves as quenched disorder and inferred parameters play the role of thermal degrees of freedom. Our analysis reveals that widely cherished Bayesian inference algorithms such as maximum likelihood and maximum a posteriori are suboptimal in the modern setting, and yields new tractable, optimal algorithms to replace them as well as novel bounds on the achievable accuracy of a large class of high-dimensional inference algorithms. Thanks to Stanford Graduate Fellowship and Mind Brain Computation IGERT grant for support.
High-dimensional statistical inference: From vector to matrix
NASA Astrophysics Data System (ADS)
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case
Statistical inference and string theory
NASA Astrophysics Data System (ADS)
Heckman, Jonathan J.
2015-09-01
In this paper, we expose some surprising connections between string theory and statistical inference. We consider a large collective of agents sweeping out a family of nearby statistical models for an M-dimensional manifold of statistical fitting parameters. When the agents making nearby inferences align along a d-dimensional grid, we find that the pooled probability that the collective reaches a correct inference is the partition function of a nonlinear sigma model in d dimensions. Stability under perturbations to the original inference scheme requires the agents of the collective to distribute along two dimensions. Conformal invariance of the sigma model corresponds to the condition of a stable inference scheme, directly leading to the Einstein field equations for classical gravity. By summing over all possible arrangements of the agents in the collective, we reach a string theory. We also use this perspective to quantify how much an observer can hope to learn about the internal geometry of a superstring compactification. Finally, we present some brief speculative remarks on applications to the AdS/CFT correspondence and Lorentzian signature space-times.
NASA Astrophysics Data System (ADS)
Shehadeh, Mahmoud M.
The Greek word "nano," meaning dwarf, refers to a reduction of size or time by 10-9, which is one thousand times smaller than a micron. (The width across the head of a pin, for instance, is 1,000,000 nanometers). First introduced in the late 1970s, the concept of nanotechnology entails the manufacture and manipulation of objects, atoms, and molecules on a nanometer scale. Currently, nanoscience and its research constitute a complete spectrum of activities towards the promised next industrial revolution, and they span the whole spectrum of physical, chemical, biological, and mathematical sciences needed to develop new tools, models and techniques that help in expanding this technology. In this thesis, we first discuss the robust parameter design for nanostructures and data collection. We then model the nano data using multinomial logit and probit models and implement statistical inference using both the frequentist and Bayesian approaches. Moreover, the mean of probabilities of obtaining different types of nanostructures are obtained using Monte Carlo simulations, and these probabilities are maximized in order to find the conditions in which the desired nanostructures would be produced in large quantity.
NASA Astrophysics Data System (ADS)
Sjöstrand, Karl; Cardenas, Valerie A.; Larsen, Rasmus; Studholme, Colin
2008-03-01
Whole-brain morphometry denotes a group of methods with the aim of relating clinical and cognitive measurements to regions of the brain. Typically, such methods require the statistical analysis of a data set with many variables (voxels and exogenous variables) paired with few observations (subjects). A common approach to this ill-posed problem is to analyze each spatial variable separately, dividing the analysis into manageable subproblems. A disadvantage of this method is that the correlation structure of the spatial variables is not taken into account. This paper investigates the use of ridge regression to address this issue, allowing for a gradual introduction of correlation information into the model. We make the connections between ridge regression and voxel-wise procedures explicit and discuss relations to other statistical methods. Results are given on an in-vivo data set of deformation based morphometry from a study of cognitive decline in an elderly population.
Thermodynamics of cellular statistical inference
NASA Astrophysics Data System (ADS)
Lang, Alex; Fisher, Charles; Mehta, Pankaj
2014-03-01
Successful organisms must be capable of accurately sensing the surrounding environment in order to locate nutrients and evade toxins or predators. However, single cell organisms face a multitude of limitations on their accuracy of sensing. Berg and Purcell first examined the canonical example of statistical limitations to cellular learning of a diffusing chemical and established a fundamental limit to statistical accuracy. Recent work has shown that the Berg and Purcell learning limit can be exceeded using Maximum Likelihood Estimation. Here, we recast the cellular sensing problem as a statistical inference problem and discuss the relationship between the efficiency of an estimator and its thermodynamic properties. We explicitly model a single non-equilibrium receptor and examine the constraints on statistical inference imposed by noisy biochemical networks. Our work shows that cells must balance sample number, specificity, and energy consumption when performing statistical inference. These tradeoffs place significant constraints on the practical implementation of statistical estimators in a cell.
Statistical learning and selective inference
Taylor, Jonathan; Tibshirani, Robert J.
2015-01-01
We describe the problem of “selective inference.” This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have “cherry-picked”—searched for the strongest associations—means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis. PMID:26100887
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Predict! Teaching Statistics Using Informational Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Local and Global Thinking in Statistical Inference
ERIC Educational Resources Information Center
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing. PMID:19351454
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1986-01-01
Failure times of software undergoing random debugging can be modeled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Inference and the introductory statistics course
NASA Astrophysics Data System (ADS)
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Statistical Inference in Retrieval Effectiveness Evaluation.
ERIC Educational Resources Information Center
Savoy, Jacques
1997-01-01
Discussion of evaluation methodology in information retrieval focuses on the average precision over a set of fixed recall values in an effort to evaluate the retrieval effectiveness of a search algorithm. Highlights include a review of traditional evaluation methodology with examples; and a statistical inference methodology called bootstrap.…
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. PMID:24300550
Asymptotic theory of quantum statistical inference
NASA Astrophysics Data System (ADS)
Hayashi, Masahito
Part I: Hypothesis Testing: Introduction to Part I -- Strong Converse and Stein's lemma in quantum hypothesis testing/Tomohiro Ogawa and Hiroshi Nagaoka -- The proper formula for relative entropy and its asymptotics in quantum probability/Fumio Hiai and Dénes Petz -- Strong Converse theorems in Quantum Information Theory/Hiroshi Nagaoka -- Asymptotics of quantum relative entropy from a representation theoretical viewpoint/Masahito Hayashi -- Quantum birthday problems: geometrical aspects of Quantum Random Coding/Akio Fujiwara -- Part II: Quantum Cramèr-Rao Bound in Mixed States Model: Introduction to Part II -- A new approach to Cramèr-Rao Bounds for quantum state estimation/Hiroshi Nagaoka -- On Fisher information of Quantum Statistical Models/Hiroshi Nagaoka -- On the parameter estimation problem for Quantum Statistical Models/Hiroshi Nagaoka -- A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to Quantum Estimation Theory/Hiroshi Nagaoka -- A linear programming approach to Attainable Cramèr-Rao Type Bounds/Masahito Hayashi -- Statistical model with measurement degree of freedom and quantum physics/Masahito Hayashi and Keiji Matsumoto -- Asymptotic Quantum Theory for the Thermal States Family/Masahito Hayashi -- State estimation for large ensembles/Richard D. Gill and Serge Massar -- Part III: Quantum Cramèr-Rao Bound in Pure States Model: Introduction to Part III-- Quantum Fisher Metric and estimation for Pure State Models/Akio Fujiwara and Hiroshi Nagaoka -- Geometry of Quantum Estimation Theory/Akio Fujiwara -- An estimation theoretical characterization of coherent states/Akio Fujiwara and Hiroshi Nagaoka -- A geometrical approach to Quantum Estimation Theory/Keiji Matsumoto -- Part IV: Group symmetric approach to Pure States Model: Introduction to Part IV -- Optimal extraction of information from finite quantum ensembles/Serge Massar and Sandu Popescu -- Asymptotic Estimation Theory for a Finite-Dimensional Pure
Likelihood-Free Inference in High-Dimensional Models.
Kousathanas, Athanasios; Leuenberger, Christoph; Helfer, Jonas; Quinodoz, Mathieu; Foll, Matthieu; Wegmann, Daniel
2016-06-01
Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza. PMID:27052569
The renormalization group via statistical inference
NASA Astrophysics Data System (ADS)
Bény, Cédric; Osborne, Tobias J.
2015-08-01
In physics, one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of states, and argue that the renormalization group (RG) arises from the inherent ambiguities associated with the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives in a given equivalence class. This provides a unifying framework and reveals the role played by information in renormalization. We validate this idea by showing that it justifies the use of low-momenta n-point functions as statistically relevant observables around a Gaussian hypothesis. These results enable the calculation of distinguishability in quantum field theory. Our methods also provide a way to extend renormalization techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the relationships between various type of RG.
Verbal framing of statistical evidence drives children's preference inferences.
Garvin, Laura E; Woodward, Amanda L
2015-05-01
Although research has shown that statistical information can support children's inferences about specific psychological causes of others' behavior, previous work leaves open the question of how children interpret statistical information in more ambiguous situations. The current studies investigated the effect of specific verbal framing information on children's ability to infer mental states from statistical regularities in behavior. We found that preschool children inferred others' preferences from their statistically non-random choices only when they were provided with verbal information placing the person's behavior in a specifically preference-related context, not when the behavior was presented in a non-mentalistic action context or an intentional choice context. Furthermore, verbal framing information showed some evidence of supporting children's mental state inferences even from more ambiguous statistical data. These results highlight the role that specific, relevant framing information can play in supporting children's ability to derive novel insights from statistical information. PMID:25704581
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method
Network topology inference from infection statistics
NASA Astrophysics Data System (ADS)
Tomovski, Igor; Kocarev, Ljupčo
2015-10-01
We introduce a mathematical framework for identification of network topology, based on data collected from infectious SIS process occurring on a network. An exact expression for the weight of each network link (existing or not) as a function of infectious statistics, is obtained. An algorithm for proper implementation of the analyzed concept is suggested and the validity of the obtained result is confirmed by numerical simulations performed on a number of synthetic (computer generated) networks.
Simultaneous Statistical Inference for Epigenetic Data
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology. PMID:25965389
Unequal Division of Type I Risk in Statistical Inferences
ERIC Educational Resources Information Center
Meek, Gary E.; Ozgur, Ceyhun O.
2004-01-01
Introductory statistics texts give extensive coverage to two-sided inferences in hypothesis testing, interval estimation, and one-sided hypothesis tests. Very few discuss the possibility of one-sided interval estimation at all. Even fewer do so in any detail. Two of the business statistics texts we reviewed mentioned the possibility of dividing…
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
LOWER LEVEL INFERENCE CONTROL IN STATISTICAL DATABASE SYSTEMS
Lipton, D.L.; Wong, H.K.T.
1984-02-01
An inference is the process of transforming unclassified data values into confidential data values. Most previous research in inference control has studied the use of statistical aggregates to deduce individual records. However, several other types of inference are also possible. Unknown functional dependencies may be apparent to users who have 'expert' knowledge about the characteristics of a population. Some correlations between attributes may be concluded from 'commonly-known' facts about the world. To counter these threats, security managers should use random sampling of databases of similar populations, as well as expert systems. 'Expert' users of the DATABASE SYSTEM may form inferences from the variable performance of the user interface. Users may observe on-line turn-around time, accounting statistics. the error message received, and the point at which an interactive protocol sequence fails. One may obtain information about the frequency distributions of attribute values, and the validity of data object names from this information. At the back-end of a database system, improved software engineering practices will reduce opportunities to bypass functional units of the database system. The term 'DATA OBJECT' should be expanded to incorporate these data object types which generate new classes of threats. The security of DATABASES and DATABASE SySTEMS must be recognized as separate but related problems. Thus, by increased awareness of lower level inferences, system security managers may effectively nullify the threat posed by lower level inferences.
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom episodes…
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special
Statistical Detection of EEG Synchrony Using Empirical Bayesian Inference
Singh, Archana K.; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. PMID:25822617
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. PMID:25822617
Inference in high-dimensional parameter space.
O'Hare, Anthony
2015-11-01
Model parameter inference has become increasingly popular in recent years in the field of computational epidemiology, especially for models with a large number of parameters. Techniques such as Approximate Bayesian Computation (ABC) or maximum/partial likelihoods are commonly used to infer parameters in phenomenological models that best describe some set of data. These techniques rely on efficient exploration of the underlying parameter space, which is difficult in high dimensions, especially if there are correlations between the parameters in the model that may not be known a priori. The aim of this article is to demonstrate the use of the recently invented Adaptive Metropolis algorithm for exploring parameter space in a practical way through the use of a simple epidemiological model. PMID:26176624
Breakdown of statistical inference from some random experiments
NASA Astrophysics Data System (ADS)
Kupczynski, Marian; De Raedt, Hans
2016-03-01
Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated only one or a few finite samples are available. In this paper we study data generated by pseudo-random computer experiments operating according to particular internal protocols. We show that the standard statistical analysis performed on a sample, containing 105 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference if a sample is not homogeneous. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting sample inhomogeneity. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample homogeneity tests should be incorporated in any statistical analysis of experimental data. If such tests are not performed the reported conclusions and estimates of the errors cannot be trusted.
Indirect Fourier transform in the context of statistical inference.
Muthig, Michael; Prévost, Sylvain; Orglmeister, Reinhold; Gradzielski, Michael
2016-09-01
Inferring structural information from the intensity of a small-angle scattering (SAS) experiment is an ill-posed inverse problem. Thus, the determination of a solution is in general non-trivial. In this work, the indirect Fourier transform (IFT), which determines the pair distance distribution function from the intensity and hence yields structural information, is discussed within two different statistical inference approaches, namely a frequentist one and a Bayesian one, in order to determine a solution objectively From the frequentist approach the cross-validation method is obtained as a good practical objective function for selecting an IFT solution. Moreover, modern machine learning methods are employed to suppress oscillatory behaviour of the solution, hence extracting only meaningful features of the solution. By comparing the results yielded by the different methods presented here, the reliability of the outcome can be improved and thus the approach should enable more reliable information to be deduced from SAS experiments. PMID:27580204
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Two dimensional unstable scar statistics.
Warne, Larry Kevin; Jorgenson, Roy Eberhardt; Kotulski, Joseph Daniel; Lee, Kelvin S. H. (ITT Industries/AES Los Angeles, CA)
2006-12-01
This report examines the localization of time harmonic high frequency modal fields in two dimensional cavities along periodic paths between opposing sides of the cavity. The cases where these orbits lead to unstable localized modes are known as scars. This paper examines the enhancements for these unstable orbits when the opposing mirrors are both convex and concave. In the latter case the construction includes the treatment of interior foci.
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods
NASA Astrophysics Data System (ADS)
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
NASA Technical Reports Server (NTRS)
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Statistical Inference for Point Process Models of Rainfall
NASA Astrophysics Data System (ADS)
Smith, James A.; Karr, Alan F.
1985-01-01
In this paper we develop maximum likelihood procedures for parameter estimation and model selection that apply to a large class of point process models that have been used to model rainfall occurrences, including Cox processes, Neyman-Scott processes, and renewal processes. The statistical inference procedures are based on the stochastic intensity λ(t) = lims→0,s>0 (1/s)E[N(t + s) - N(t)|N(u), u < t]. The likelihood function of a point process is shown to have a simple expression in terms of the stochastic intensity. The main result of this paper is a recursive procedure for computing stochastic intensities; the procedure is applicable to a broad class of point process models, including renewal Cox process with Markovian intensity processes and an important class of Neyman-Scott processes. The model selection procedure we propose, which is based on likelihood ratios, allows direct comparison of two classes of point processes to determine which provides a better model for a given data set. The estimation and model selection procedures are applied to two data sets of simulated Cox process arrivals and a data set of daily rainfall occurrences in the Potomac River basin.
Simple statistical inference algorithms for task-dependent wellness assessment.
Kailas, A; Chong, C-C; Watanabe, F
2012-07-01
Stress is a key indicator of wellness in human beings and a prime contributor to performance degradation and errors during various human tasks. The overriding purpose of this paper is to propose two algorithms (probabilistic and non-probabilistic) that iteratively track stress states to compute a wellness index in terms of the stress levels. This paper adopts the physiological view-point that high stress is accompanied with large deviations in biometrics such as body temperature, heart rate, etc., and the proposed algorithms iteratively track these fluctuations to compute a personalized wellness index that is correlated to the engagement levels of the tasks performed by the user. In essence, this paper presents a quantitative relationship between temperature, occupational stress, and wellness during different tasks. The simplicity of the statistical inference algorithms make them favorable candidates for implementation on mobile platforms such as smart phones in the future, thereby providing users an inexpensive application for self-wellness monitoring for a healthier lifestyle. PMID:22676998
Physics of epigenetic landscapes and statistical inference by cells
NASA Astrophysics Data System (ADS)
Lang, Alex H.
Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape'' where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing'' between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an ``optimal path'' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to
Statistical challenges of high-dimensional data
Johnstone, Iain M.; Titterington, D. Michael
2009-01-01
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue. PMID:19805443
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on for…
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Statistical inference for exploratory data analysis and model diagnostics.
Buja, Andreas; Cook, Dianne; Hofmann, Heike; Lawrence, Michael; Lee, Eun-Kyung; Swayne, Deborah F; Wickham, Hadley
2009-11-13
We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests. Statistical significance of 'discoveries' is measured by having the human viewer compare the plot of the real dataset with collections of plots of simulated datasets. A simple but rigorous protocol that provides inferential validity is modelled after the 'lineup' popular from criminal legal procedures. Another protocol modelled after the 'Rorschach' inkblot test, well known from (pop-)psychology, will help analysts acclimatize to random variability before being exposed to the plot of the real data. The proposed protocols will be useful for exploratory data analysis, with reference datasets simulated by using a null assumption that structure is absent. The framework is also useful for model diagnostics in which case reference datasets are simulated from the model in question. This latter point follows up on previous proposals. Adopting the protocols will mean an adjustment in working procedures for data analysts, adding more rigour, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking. PMID:19805449
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
Statistical Inference and Sensitivity to Sampling in 11-Month-Old Infants
ERIC Educational Resources Information Center
Xu, Fei; Denison, Stephanie
2009-01-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task.…
Trans-dimensional Bayesian inference for large sequential data sets
NASA Astrophysics Data System (ADS)
Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.
2015-12-01
This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.
Circumpulsar Asteroids: Inferences from Nulling Statistics and High Energy Correlations
NASA Astrophysics Data System (ADS)
Shannon, Ryan; Cordes, J. M.
2006-12-01
We have proposed that some classes of radio pulsar variability are associated with the entry of neutral asteroidal material into the pulsar magnetosphere. The region surrounding neutron stars is polluted with supernova fall-back material, which collapses and condenses into an asteroid-bearing disk that is stable for millions of years. Over time, collisional and radiative processes cause the asteroids to migrate inward until they are heated to the point of ionization. For older and cooler pulsars, asteroids ionize within the large magnetospheres and inject a sufficient amount of charged particles to alter the electrodynamics of the gap regions and modulate emission processes. This extrinsic model unifies many observed phenomena of variability that occur on time scales that are disparate with the much shorter time scales associated with pulsars and their magnetospheres. One such type of variability is nulling, in which certain pulsars exhibit episodes of quiescence that for some objects may be as short as a few pulse periods, but, for others, is longer than days. Here, in the context of this model, we examine the nulling phenomenon. We analyze the relationship between in-falling material and the statistics of nulling. In addition, as motivation for further high energy observations, we consider the relationship between the nulling and other magnetospheric processes.
PyClone: Statistical inference of clonal population structure in cancer
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P.
2016-01-01
We introduce a novel statistical method, PyClone, for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy number changes and normal cell contamination. Single cell sequencing validation demonstrates that PyClone infers accurate clustering of mutations that co-occur in individual cells. PMID:24633410
Statistical inference from capture data on closed animal populations
Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.
1978-01-01
The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
NASA Astrophysics Data System (ADS)
Lawrence, C.; Lin, L.; Lisiecki, L. E.; Khider, D.
2014-12-01
The broad goal of this presentation is to demonstrate the utility of probabilistic generative models to capture investigators' knowledge of geological processes and proxy data to draw statistical inferences about unobserved paleoclimatological events. We illustrate how this approach forces investigators to be explicit about their assumptions, and about how probability theory yields results that are a mathematical consequence of these assumptions and the data. We illustrate these ideas with the HMM-Match model that infers common times of sediment deposition in two records and the uncertainty in these inferences in the form of confidence bands. HMM-Match models the sedimentation processes that led to proxy data measured in marine sediment cores. This Bayesian model has three components: 1) a generative probabilistic model that proceeds from the underlying geophysical and geochemical events, specifically the sedimentation events to the generation the proxy data Sedimentation ---> Proxy Data ; 2) a recursive algorithm that reverses the logic of the model to yield inference about the unobserved sedimentation events and the associated alignment of the records based on proxy data Proxy Data ---> Sedimentation (Alignment) ; 3) an expectation maximization algorithm for estimating two unknown parameters. We applied HMM-Match to align 35 Late Pleistocene records to a global benthic d18Ostack and found that the mean width of 95% confidence intervals varies between 3-23 kyr depending on the resolution and noisiness of the core's d18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. Results from this algorithm will allow researchers to examine the robustness of their conclusions with respect to alignment uncertainty. Figure 1 shows the confidence bands for one low resolution record.
Young children's use of statistical sampling evidence to infer the subjectivity of preferences.
Ma, Lili; Xu, Fei
2011-09-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a preference different from their own. When there was no alternative in the population or if the sampling was random, 2-year-olds did not ascribe a preference and persisted in their initial beliefs that the person would share their own preference. We found similar but weaker patterns of responses in 16-month-olds. These results suggest that the ability to infer the subjectivity of preferences based on sampling information begins to emerge between 16 months and 2 years. Our findings provide some of the first evidence that from early in development, young children can use statistical evidence to make rational inferences about the social world. PMID:21353215
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Young Children's Use of Statistical Sampling Evidence to Infer the Subjectivity of Preferences
ERIC Educational Resources Information Center
Ma, Lili; Xu, Fei
2011-01-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a…
Inferring the connectivity of coupled oscillators from time-series statistical similarity analysis
Tirabassi, Giulio; Sevilla-Escoboza, Ricardo; Buldú, Javier M.; Masoller, Cristina
2015-01-01
A system composed by interacting dynamical elements can be represented by a network, where the nodes represent the elements that constitute the system, and the links account for their interactions, which arise due to a variety of mechanisms, and which are often unknown. A popular method for inferring the system connectivity (i.e., the set of links among pairs of nodes) is by performing a statistical similarity analysis of the time-series collected from the dynamics of the nodes. Here, by considering two systems of coupled oscillators (Kuramoto phase oscillators and Rössler chaotic electronic oscillators) with known and controllable coupling conditions, we aim at testing the performance of this inference method, by using linear and non linear statistical similarity measures. We find that, under adequate conditions, the network links can be perfectly inferred, i.e., no mistakes are made regarding the presence or absence of links. These conditions for perfect inference require: i) an appropriated choice of the observed variable to be analysed, ii) an appropriated interaction strength, and iii) an adequate thresholding of the similarity matrix. For the dynamical units considered here we find that the linear statistical similarity measure performs, in general, better than the non-linear ones. PMID:26042395
ERIC Educational Resources Information Center
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Van den Noortgate, Wim; Onghena, Patrick
2007-01-01
A solid understanding of "inferential statistics" is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications…
Young children use statistical sampling to infer the preferences of other people.
Kushnir, Tamar; Xu, Fei; Wellman, Henry M
2010-08-01
Psychological scientists use statistical information to determine the workings of human behavior. We argue that young children do so as well. Over the course of a few years, children progress from viewing human actions as intentional and goal directed to reasoning about the psychological causes underlying such actions. Here, we show that preschoolers and 20-month-old infants can use statistical information-namely, a violation of random sampling-to infer that an agent is expressing a preference for one type of toy instead of another type of toy. Children saw a person remove five toys of one type from a container of toys. Preschoolers and infants inferred that the person had a preference for that type of toy when there was a mismatch between the sampled toys and the population of toys in the box. Mere outcome consistency, time spent with the toys, and positive attention toward the toys did not lead children to infer a preference. These findings provide an important demonstration of how statistical learning could underpin the rapid acquisition of early psychological knowledge. PMID:20622142
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on "significance" tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the "new statistics" (confidence intervals) logic. PMID:25745383
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject. PMID:27081307
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries.
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical-statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject. PMID:27081307
Inference in infinite-dimensional inverse problems - Discretization and duality
NASA Technical Reports Server (NTRS)
Stark, Philip B.
1992-01-01
Many techniques for solving inverse problems involve approximating the unknown model, a function, by a finite-dimensional 'discretization' or parametric representation. The uncertainty in the computed solution is sometimes taken to be the uncertainty within the parametrization; this can result in unwarranted confidence. The theory of conjugate duality can overcome the limitations of discretization within the 'strict bounds' formalism, a technique for constructing confidence intervals for functionals of the unknown model incorporating certain types of prior information. The usual computational approach to strict bounds approximates the 'primal' problem in a way that the resulting confidence intervals are at most long enough to have the nominal coverage probability. There is another approach based on 'dual' optimization problems that gives confidence intervals with at least the nominal coverage probability. The pair of intervals derived by the two approaches bracket a correct confidence interval. The theory is illustrated with gravimetric, seismic, geomagnetic, and helioseismic problems and a numerical example in seismology.
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on “significance” tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the “new statistics” (confidence intervals) logic. PMID:25745383
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
Inferring biological tasks using Pareto analysis of high-dimensional data.
Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri
2015-03-01
We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks. PMID:25622107
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies. PMID:26736741
Statistical entropy of charged two-dimensional black holes
NASA Astrophysics Data System (ADS)
Teo, Edward
1998-06-01
The statistical entropy of a five-dimensional black hole in Type II string theory was recently derived by showing that it is U-dual to the three-dimensional Bañados-Teitelboim-Zanelli black hole, and using Carlip's method to count the microstates of the latter. This is valid even for the non-extremal case, unlike the derivation which relies on D-brane techniques. In this letter, I shall exploit the U-duality that exists between the five-dimensional black hole and the two-dimensional charged black hole of McGuigan, Nappi and Yost, to microscopically compute the entropy of the latter. It is shown that this result agrees with previous calculations using thermodynamic arguments.
Inferences on weather extremes and weather-related disasters: a review of statistical methods
NASA Astrophysics Data System (ADS)
Visser, H.; Petersen, A. C.
2012-02-01
The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in research of climate change. Due to the great societal consequences of extremes - historically, now and in the future - the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to relatively short (30 to 60 yr) databases with disaster statistics and human impacts. When scanning peer-reviewed literature on weather extremes and its impacts, it is noticeable that many different methods are used to make inferences. However, discussions on these methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well. In this article, a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters is given. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs) and the availability of uncertainty information. As for stationarity assumptions, the outcome was that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices it was found that often more than one PDF shape fits to the same data. From a simulation study the conclusion can be drawn that both the generalized extreme value (GEV) distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty, it is advisable to test conclusions on extremes
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. PMID:23260716
NASA Technical Reports Server (NTRS)
Lerner, Jeffrey A.; Jedlovec, Gary J.; Atkinson, Robert J.
1998-01-01
Ever since the first satellite image loops from the 6.3 micron water vapor channel on the METEOSAT-1 in 1978, there have been numerous efforts (many to a great degree of success) to relate the water vapor radiance patterns to familiar atmospheric dynamic quantities. The realization of these efforts is becoming evident with the merging of satellite derived winds into predictive models (Velden et al., 1997; Swadley and Goerss, 1989). Another parameter that has been quantified from satellite water vapor channel measurements is upper tropospheric relative humidity (UTH) (e.g., Soden and Bretherton, 1996; Schmetz and Turpeinen, 1988). These humidity measurements, in turn, can be used to quantify upper tropospheric water vapor and its transport to more accurately diagnose climate changes (Lerner et al., 1998; Schmetz et al. 1995a) and quantify radiative processes in the upper troposphere. Also apparent in water vapor imagery animations are regions of subsiding and ascending air flow. Indeed, a component of the translated motions we observe are due to vertical velocities. The few attempts at exploiting this information have been met with a fair degree of success. Picon and Desbois (1990) statistically related Meteosat monthly mean water vapor radiances to six standard pressure levels of the European Centre for Medium Range Weather Forecast (ECMWF) model vertical velocities and found correlation coefficients of about 0.50 or less. This paper presents some preliminary results of viewing climatological satellite water vapor data in a different fashion. Specifically, we attempt to infer the three dimensional flow characteristics of the mid- to upper troposphere as portrayed by GOES VAS during the warm ENSO event (1987) and a subsequent cold period in 1998.
Statistical mechanics of two-dimensional and geophysical flows
NASA Astrophysics Data System (ADS)
Bouchet, Freddy; Venaille, Antoine
2012-06-01
The theoretical study of the self-organization of two-dimensional and geophysical turbulent flows is addressed based on statistical mechanics methods. This review is a self-contained presentation of classical and recent works on this subject; from the statistical mechanics basis of the theory up to applications to Jupiter’s troposphere and ocean vortices and jets. Emphasize has been placed on examples with available analytical treatment in order to favor better understanding of the physics and dynamics. After a brief presentation of the 2D Euler and quasi-geostrophic equations, the specificity of two-dimensional and geophysical turbulence is emphasized. The equilibrium microcanonical measure is built from the Liouville theorem. Important statistical mechanics concepts (large deviations and mean field approach) and thermodynamic concepts (ensemble inequivalence and negative heat capacity) are briefly explained and described. On this theoretical basis, we predict the output of the long time evolution of complex turbulent flows as statistical equilibria. This is applied to make quantitative models of two-dimensional turbulence, the Great Red Spot and other Jovian vortices, ocean jets like the Gulf-Stream, and ocean vortices. A detailed comparison between these statistical equilibria and real flow observations is provided. We also present recent results for non-equilibrium situations, for the studies of either the relaxation towards equilibrium or non-equilibrium steady states. In this last case, forces and dissipation are in a statistical balance; fluxes of conserved quantity characterize the system and microcanonical or other equilibrium measures no longer describe the system.
Social inferences from faces: ambient images generate a three-dimensional model.
Sutherland, Clare A M; Oldmeadow, Julian A; Santos, Isabel M; Towler, John; Michael Burt, D; Young, Andrew W
2013-04-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of 1000 ambient images (images that are intended to be representative of those encountered in everyday life, see Jenkins, White, Van Montfort, & Burton, 2011). Experiment 2 then tested Oosterhof and Todorov's two-dimensional model on this extensive sample of face images. The original two dimensions were replicated and a novel 'youthful-attractiveness' factor also emerged. Experiment 3 successfully cross-validated the three-dimensional model using face averages directly constructed from the factor scores. These findings highlight the utility of the original trustworthiness and dominance dimensions, but also underscore the need to utilise varied face stimuli: with a more realistically diverse set of face images, social inferences from faces show a more elaborate underlying structure than hitherto suggested. PMID:23376296
Statistical Properties of Decaying Two-Dimensional Turbulence
NASA Astrophysics Data System (ADS)
Nakamura, Kenshi; Takahashi, Takehiro; Nakano, Tohru
1993-04-01
We investigate the temporal development of the statistical properties of two-dimensional incompressible turbulence simulated for a long time. First, we obtain information on the evolving microscopic vortical structure by inspecting the time variation of qth order fractal dimensions of the enstrophy dissipation rate. The conclusion drawn from such an inspection is consistent with a picture given by Kida (J. Phys. Soc. Jpn. 54 (1985) 2840); in the first stage the \
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
Energy Science and Technology Software Center (ESTSC)
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach.more » The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.« less
Staude, Benjamin; Grün, Sonja; Rotter, Stefan
2009-01-01
The extent to which groups of neurons exhibit higher-order correlations in their spiking activity is a controversial issue in current brain research. A major difficulty is that currently available tools for the analysis of massively parallel spike trains (N >10) for higher-order correlations typically require vast sample sizes. While multiple single-cell recordings become increasingly available, experimental approaches to investigate the role of higher-order correlations suffer from the limitations of available analysis techniques. We have recently presented a novel method for cumulant-based inference of higher-order correlations (CuBIC) that detects correlations of higher order even from relatively short data stretches of length T = 10–100 s. CuBIC employs the compound Poisson process (CPP) as a statistical model for the population spike counts, and assumes spike trains to be stationary in the analyzed data stretch. In the present study, we describe a non-stationary version of the CPP by decoupling the correlation structure from the spiking intensity of the population. This allows us to adapt CuBIC to time-varying firing rates. Numerical simulations reveal that the adaptation corrects for false positive inference of correlations in data with pure rate co-variation, while allowing for temporal variations of the firing rates has a surprisingly small effect on CuBICs sensitivity for correlations. PMID:20725510
Feldman, Naomi H.; Griffiths, Thomas L.; Morgan, James L.
2009-01-01
A variety of studies have demonstrated that organizing stimuli into categories can affect the way the stimuli are perceived. We explore the influence of categories on perception through one such phenomenon, the perceptual magnet effect, in which discriminability between vowels is reduced near prototypical vowel sounds. We present a Bayesian model to explain why this reduced discriminability might occur: it arises as a consequence of optimally solving the statistical problem of perception in noise. In the optimal solution to this problem, listeners’ perception is biased toward phonetic category means because they use knowledge of these categories to guide their inferences about speakers’ target productions. Simulations show that model predictions closely correspond to previously published human data, and novel experimental results provide evidence for the predicted link between perceptual warping and noise. The model unifies several previous accounts of the perceptual magnet effect and provides a framework for exploring categorical effects in other domains. PMID:19839683
McDonald, L.L.; Erickson, W.P.; Strickland, M.D.
1995-12-31
The objective of the Coastal Habitat Injury Assessment study was to document and quantify injury to biota of the shallow subtidal, intertidal, and supratidal zones throughout the shoreline affected by oil or cleanup activity associated with the Exxon Valdez oil spill. The results of these studies were to be used to support the Trustee`s Type B Natural Resource Damage Assessment under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA). A probability based stratified random sample of shoreline segments was selected with probability proportional to size from each of 15 strata (5 habitat types crossed with 3 levels of potential oil impact) based on those data available in July, 1989. Three study regions were used: Prince William Sound, Cook Inlet/Kenai Peninsula, and Kodiak/Alaska Peninsula. A Geographic Information System was utilized to combine oiling and habitat data and to select the probability sample of study sites. Quasi-experiments were conducted where randomly selected oiled sites were compared to matched reference sites. Two levels of statistical inferences, philosophical bases, and limitations are discussed and illustrated with example data from the resulting studies. 25 refs., 4 figs., 1 tab.
Gaggiotti, Oscar E
2010-11-01
Ever since the introduction of allozymes in the 1960s, evolutionary biologists and ecologists have continued to search for more powerful molecular markers to estimate important parameters such as effective population size and migration rates and to make inferences about the demographic history of populations, the relationships between individuals and the genetic architecture of phenotypic variation (Bensch & Akesson 2005; Bonin et al. 2007). Choosing a marker requires a thorough consideration of the trade-offs associated with the different techniques and the type of data obtained from them. Some markers can be very informative but require substantial amounts of start-up time (e.g. microsatellites), while others require very little time but are much less polymorphic. Amplified fragment length polymorphism (AFLP) is a firmly established molecular marker technique that falls in this latter category. AFLPs are widely distributed throughout the genome and can be used on organisms for which there is no a priori sequence information (Meudt & Clarke 2007). These properties together with their moderate cost and short start-up time have made them the method of choice for many molecular ecology studies of wild species (Bensch & Akesson 2005). However, they have a major disadvantage, they are dominant. This represents a very important limitation because many statistical genetics methods appropriate for molecular ecology studies require the use of codominant markers. In this issue, Foll et al. (2010) present an innovative hierarchical Bayesian method that overcomes this limitation. The proposed approach represents a comprehensive statistical treatment of the fluorescence of AFLP bands and leads to accurate inferences about the genetic structure of natural populations. Besides allowing a quasi-codominant treatment of AFLPs, this new method also solves the difficult problems posed by subjectivity in the scoring of AFLP bands. PMID:20958811
Quantum Statistical Entropy of Five-Dimensional Black Hole
NASA Astrophysics Data System (ADS)
Zhao, Ren; Wu, Yue-Qin; Zhang, Sheng-Li
2006-05-01
The generalized uncertainty relation is introduced to calculate quantum statistic entropy of a black hole. By using the new equation of state density motivated by the generalized uncertainty relation, we discuss entropies of Bose field and Fermi field on the background of the five-dimensional spacetime. In our calculation, we need not introduce cutoff. There is not the divergent logarithmic term as in the original brick-wall method. And it is obtained that the quantum statistic entropy corresponding to black hole horizon is proportional to the area of the horizon. Further it is shown that the entropy of black hole is the entropy of quantum state on the surface of horizon. The black hole's entropy is the intrinsic property of the black hole. The entropy is a quantum effect. It makes people further understand the quantum statistic entropy.
Statistical mechanics of shell models for two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Aurell, E.; Boffetta, G.; Crisanti, A.; Frick, P.; Paladin, G.; Vulpiani, A.
1994-12-01
We study shell models that conserve the analogs of energy and enstrophy and hence are designed to mimic fluid turbulence in two-dimensions (2D). The main result is that the observed state is well described as a formal statistical equilibrium, closely analogous to the approach to two-dimensional ideal hydrodynamics of Onsager [Nuovo Cimento Suppl. 6, 279 (1949)], Hopf [J. Rat. Mech. Anal. 1, 87 (1952)], and Lee [Q. Appl. Math. 10, 69 (1952)]. In the presence of forcing and dissipation we observe a forward flux of enstrophy and a backward flux of energy. These fluxes can be understood as mean diffusive drifts from a source to two sinks in a system which is close to local equilibrium with Lagrange multipliers (``shell temperatures'') changing slowly with scale. This is clear evidence that the simplest shell models are not adequate to reproduce the main features of two-dimensional turbulence. The dimensional predictions on the power spectra from a supposed forward cascade of enstrophy and from one branch of the formal statistical equilibrium coincide in these shell models in contrast to the corresponding predictions for the Navier-Stokes and Euler equations in 2D. This coincidence has previously led to the mistaken conclusion that shell models exhibit a forward cascade of enstrophy. We also study the dynamical properties of the models and the growth of perturbations.
Univariate description and bivariate statistical inference: the first step delving into data
2016-01-01
In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student’s t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs. PMID:27047950
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt. PMID:24051595
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
NASA Astrophysics Data System (ADS)
Morozov, Alexandre
2009-03-01
Formation of nucleosome core particles is a first step towards packaging genomic DNA into chromosomes in living cells. Nucleosomes are formed by wrapping 147 base pairs of DNA around a spool of eight histone proteins. It is reasonable to assume that formation of single nucleosomes in vitro is determined by DNA sequence alone: it costs less elastic energy to wrap a flexible DNA polymer around the histone octamer, and more if the polymer is rigid. However, it is unclear to which extent this effect is important in living cells. Cells have evolved chromatin remodeling enzymes that expend ATP to actively reposition nucleosomes. In addition, nucleosome positioning on long DNA sequences is affected by steric exclusion - many nucleosomes have to form simultaneously without overlap. Currently available bioinformatics methods for predicting nucleosome positions are trained on in vivo data sets and are thus unable to distinguish between extrinsic and intrinsic nucleosome positioning signals. In order to see the relative importance of such signals for nucleosome positioning in vivo, we have developed a model based on a large collection of DNA sequences from nucleosomes reconstituted in vitro by salt dialysis. We have used these data to infer the free energy of nucleosome formation at each position along the genome. The method uses an exact result from the statistical mechanics of classical 1D fluids to infer the free energy landscape from nucleosome occupancy. We will discuss the degree to which in vitro nucleosome occupancy profiles are predictive of in vivo nucleosome positions, and will estimate how many nucleosomes are sequence-specific and how many are positioned purely by steric exclusion. Our approach to nucleosome energetics should be applicable across multiple organisms and genomic regions.
Statistical Downscaling in Multi-dimensional Wave Climate Forecast
NASA Astrophysics Data System (ADS)
Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.
2009-04-01
Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the
Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642
Lagrangian statistics in forced two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Kamps, Oliver; Friedrich, Rudolf
2007-11-01
In recent years the Lagrangian description of turbulent flows has attracted much interest from the experimental point of view and as well is in the focus of numerical and analytical investigations. We present detailed numerical investigations of Lagrangian tracer particles in the inverse energy cascade of two-dimensional turbulence. In the first part we focus on the shape and scaling properties of the probability distribution functions for the velocity increments and compare them to the Eulerian case and the increment statistics in three dimensions. Motivated by our observations we address the important question of translating increment statistics from one frame of reference to the other [1]. To reveal the underlying physical mechanism we determine numerically the involved transition probabilities. In this way we shed light on the source of Lagrangian intermittency.[1ex] [1] R. Friedrich, R. Grauer, H. Hohmann, O. Kamps, A Corrsin type approximation for Lagrangian fluid Turbulence , arXiv:0705.3132
Inference and Decoding of Motor Cortex Low-Dimensional Dynamics via Latent State-Space Models.
Aghagolzadeh, Mehdi; Truccolo, Wilson
2016-02-01
Motor cortex neuronal ensemble spiking activity exhibits strong low-dimensional collective dynamics (i.e., coordinated modes of activity) during behavior. Here, we demonstrate that these low-dimensional dynamics, revealed by unsupervised latent state-space models, can provide as accurate or better reconstruction of movement kinematics as direct decoding from the entire recorded ensemble. Ensembles of single neurons were recorded with triple microelectrode arrays (MEAs) implanted in ventral and dorsal premotor (PMv, PMd) and primary motor (M1) cortices while nonhuman primates performed 3-D reach-to-grasp actions. Low-dimensional dynamics were estimated via various types of latent state-space models including, for example, Poisson linear dynamic system (PLDS) models. Decoding from low-dimensional dynamics was implemented via point process and Kalman filters coupled in series. We also examined decoding based on a predictive subsampling of the recorded population. In this case, a supervised greedy procedure selected neuronal subsets that optimized decoding performance. When comparing decoding based on predictive subsampling and latent state-space models, the size of the neuronal subset was set to the same number of latent state dimensions. Overall, our findings suggest that information about naturalistic reach kinematics present in the recorded population is preserved in the inferred low-dimensional motor cortex dynamics. Furthermore, decoding based on unsupervised PLDS models may also outperform previous approaches based on direct decoding from the recorded population or on predictive subsampling. PMID:26336135
Brannigan, V.M.; Bier, V.M.; Berg, C.
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily that other product liability cases on indirect or statistical proof of injury in toxic cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general. 23 refs.
Conn, Paul B.; Johnson, Devin S.; Ver Hoef, Jay M.; Hooten, Mevin B.; London, Joshua M.; Boveng, Peter L.
2015-01-01
Ecologists often fit models to survey data to estimate and explain variation in animal abundance. Such models typically require that animal density remains constant across the landscape where sampling is being conducted, a potentially problematic assumption for animals inhabiting dynamic landscapes or otherwise exhibiting considerable spatiotemporal variation in density. We review several concepts from the burgeoning literature on spatiotemporal statistical models, including the nature of the temporal structure (i.e., descriptive or dynamical) and strategies for dimension reduction to promote computational tractability. We also review several features as they specifically relate to abundance estimation, including boundary conditions, population closure, choice of link function, and extrapolation of predicted relationships to unsampled areas. We then compare a suite of novel and existing spatiotemporal hierarchical models for animal count data that permit animal density to vary over space and time, including formulations motivated by resource selection and allowing for closed populations. We gauge the relative performance (bias, precision, computational demands) of alternative spatiotemporal models when confronted with simulated and real data sets from dynamic animal populations. For the latter, we analyze spotted seal (Phoca largha) counts from an aerial survey of the Bering Sea where the quantity and quality of suitable habitat (sea ice) changed dramatically while surveys were being conducted. Simulation analyses suggested that multiple types of spatiotemporal models provide reasonable inference (low positive bias, high precision) about animal abundance, but have potential for overestimating precision. Analysis of spotted seal data indicated that several model formulations, including those based on a log-Gaussian Cox process, had a tendency to overestimate abundance. By contrast, a model that included a population closure assumption and a scale prior on total
Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Hung, H M James; Wang, Sue-Jane; O'Neill, Robert
2005-02-01
Without a placebo arm, any non-inferiority inference involving assessment of the placebo effect under the active control trial setting is difficult. The statistical risk for falsely concluding non-inferiority cannot be evaluated unless the constancy assumption approximately holds that the effect of the active control under the historical trial setting where the control effect can be assessed carries to the noninferiority trial setting. The constancy assumption cannot be checked because of missing the placebo arm in the non-inferiority trial. Depending on how serious the violation of the assumption is thought to be, one may need to seek an alternative design strategy that includes a cushion for a very conservative non-inferiority analysis or shows superiority of the experimental treatment over the control. Determination of the non-inferiority margin depends on what objective the non-inferiority analysis is intended to achieve. The margin can be a fixed margin or a margin functionally defined. Between-trial differences always exist and need to be properly considered. PMID:16395994
You, Jinhong; Zhou, Haibo
2009-01-01
We consider statistical inference on a regression model in which some covariables are measured with errors together with an auxiliary variable. The proposed estimation for the regression coefficients is based on some estimating equations. This new method alleates some drawbacks of previously proposed estimations. This includes the requirment of undersmoothing the regressor functions over the auxiliary variable, the restriction on other covariables which can be observed exactly, among others. The large sample properties of the proposed estimator are established. We further propose a jackknife estimation, which consists of deleting one estimating equation (instead of one obervation) at a time. We show that the jackknife estimator of the regression coefficients and the estimating equations based estimator are asymptotically equivalent. Simulations show that the jackknife estimator has smaller biases when sample size is small or moderate. In addition, the jackknife estimation can also provide a consistent estimator of the asymptotic covariance matrix, which is robust to the heteroscedasticity. We illustrate these methods by applying them to a real data set from marketing science. PMID:22199460
Statistical inference of the time-varying structure of gene-regulation networks
2010-01-01
Background Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. Methods To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). Results We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. Conclusions ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. PMID:20860793
Lagrangian statistics in weakly forced two-dimensional turbulence.
Rivera, Michael K; Ecke, Robert E
2016-01-01
Measurements of Lagrangian single-point and multiple-point statistics in a quasi-two-dimensional stratified layer system are reported. The system consists of a layer of salt water over an immiscible layer of Fluorinert and is forced electromagnetically so that mean-squared vorticity is injected at a well-defined spatial scale ri. Simultaneous cascades develop in which enstrophy flows predominately to small scales whereas energy cascades, on average, to larger scales. Lagrangian correlations and one- and two-point displacements are measured for random initial conditions and for initial positions within topological centers and saddles. Some of the behavior of these quantities can be understood in terms of the trapping characteristics of long-lived centers, the slow motion near strong saddles, and the rapid fluctuations outside of either centers or saddles. We also present statistics of Lagrangian velocity fluctuations using energy spectra in frequency space and structure functions in real space. We compare with complementary Eulerian velocity statistics. We find that simultaneous inverse energy and enstrophy ranges present in spectra are not directly echoed in real-space moments of velocity difference. Nevertheless, the spectral ranges line up well with features of moment ratios, indicating that although the moments are not exhibiting unambiguous scaling, the behavior of the probability distribution functions is changing over short ranges of length scales. Implications for understanding weakly forced 2D turbulence with simultaneous inverse and direct cascades are discussed. PMID:26826855
NASA Astrophysics Data System (ADS)
Riezler, Stefan
2000-08-01
In this thesis, we present two approaches to a rigorous mathematical and algorithmic foundation of quantitative and statistical inference in constraint-based natural language processing. The first approach, called quantitative constraint logic programming, is conceptualized in a clear logical framework, and presents a sound and complete system of quantitative inference for definite clauses annotated with subjective weights. This approach combines a rigorous formal semantics for quantitative inference based on subjective weights with efficient weight-based pruning for constraint-based systems. The second approach, called probabilistic constraint logic programming, introduces a log-linear probability distribution on the proof trees of a constraint logic program and an algorithm for statistical inference of the parameters and properties of such probability models from incomplete, i.e., unparsed data. The possibility of defining arbitrary properties of proof trees as properties of the log-linear probability model and efficiently estimating appropriate parameter values for them permits the probabilistic modeling of arbitrary context-dependencies in constraint logic programs. The usefulness of these ideas is evaluated empirically in a small-scale experiment on finding the correct parses of a constraint-based grammar. In addition, we address the problem of computational intractability of the calculation of expectations in the inference task and present various techniques to approximately solve this task. Moreover, we present an approximate heuristic technique for searching for the most probable analysis in probabilistic constraint logic programs.
Convertino, Matteo; Mangoubi, Rami S.; Linkov, Igor; Lowry, Nathan C.; Desai, Mukund
2012-01-01
Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. Conclusions/Significance Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data. PMID:23115629
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Happ, Martin; Harrar, Solomon W; Bathke, Arne C
2016-07-01
We propose tests for main and simple treatment effects, time effects, as well as treatment by time interactions in possibly high-dimensional multigroup repeated measures designs. The proposed inference procedures extend the work by Brunner et al. (2012) from two to several treatment groups and remain valid for unbalanced data and under unequal covariance matrices. In addition to showing consistency when sample size and dimension tend to infinity at the same rate, we provide finite sample approximations and evaluate their performance in a simulation study, demonstrating better maintenance of the nominal α-level than the popular Box-Greenhouse-Geisser and Huynh-Feldt methods, and a gain in power for informatively increasing dimension. Application is illustrated using electroencephalography (EEG) data from a neurological study involving patients with Alzheimer's disease and other cognitive impairments. PMID:26700536
Simpson, Helen Blair; Petkova, Eva; Cheng, Jianfeng; Huppert, Jonathan; Foa, Edna; Liebowitz, Michael R
2008-07-01
Longitudinal clinical trials in psychiatry have used various statistical methods to examine treatment effects. The validity of the inferences depends upon the different method's assumptions and whether a given study violates those assumptions. The objective of this paper was to elucidate these complex issues by comparing various methods for handling missing data (e.g., last observation carried forward [LOCF], completer analysis, propensity-adjusted multiple imputation) and for analyzing outcome (e.g., end-point analysis, repeated-measures analysis of variance [RM-ANOVA], mixed-effects models [MEMs]) using data from a multi-site randomized controlled trial in obsessive-compulsive disorder (OCD). The trial compared the effects of 12 weeks of exposure and ritual prevention (EX/RP), clomipramine (CMI), their combination (EX/RP&CMI) or pill placebo in 122 adults with OCD. The primary outcome measure was the Yale-Brown Obsessive Compulsive Scale. For most comparisons, inferences about the relative efficacy of the different treatments were impervious to different methods for handling missing data and analyzing outcome. However, when EX/RP was compared to CMI and when CMI was compared to placebo, traditional methods (e.g., LOCF, RM-ANOVA) led to different inferences than currently recommended alternatives (e.g., multiple imputation based on estimation-maximization algorithm, MEMs). Thus, inferences about treatment efficacy can be affected by statistical choices. This is most likely when there are small but potentially clinically meaningful treatment differences and when sample sizes are modest. The use of appropriate statistical methods in psychiatric trials can advance public health by ensuring that valid inferences are made about treatment efficacy. PMID:17892885
Vorticity statistics of bounded two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Clercx, H. J. H.; Molenaar, D.; van Heijst, Gertjan
2004-11-01
Vorticity statistics play an important role in the determination of small-scale dynamics in forced two-dimensional turbulence. On the basis of the Hölder-continuity of the vorticity field ω(t,x), the scaling behavior of vorticity structure functions S_p(ω(ℓ)), of order p, provides clues on small-scale intermittency. Confirming earlier ideas of Sulem and Frisch (JFM 72, 1975), Eyink (Phys. D 91, 1996) proved the following scaling of the second-order structure function S_2(ω(ℓ))≡<|ω(t,x+r)-ω(t,x)|^2> ˜ℓ^ζ_2, with ζ_2≤ 2/3 and ℓ≤ℓ_f. Here, ℓ=|r|, ℓf is the typical energy-injection scale, associated to an external forcing and the brackets <\\cdot> denote combined space- and time-averaging. The only assumption used to derive this scaling was a constant enstrophy flux to small scales, in the so-called enstrophy cascade range. On the contrary, using the classical Batchelor argument for the advection of a passive scalar, Falkovich and Lebedev (PRE 50, 1994) argued that one must have ζ_p=0 for all p. With new direct numerical simulations we address these issues for a bounded square domain, using the no-slip boundary condition for the velocity. Our results are compared with the earlier experimental results of Paret and Tabeling (PRL 83, 1999).
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition.
Tang, Qin; Song, Yulong; Shi, Mijuan; Cheng, Yingyin; Zhang, Wanting; Xia, Xiao-Qin
2015-01-01
Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts. PMID:26607834
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition
Tang, Qin; Shi, Mijuan; Cheng, Yingyin; Zhang, Wanting; Xia, Xiao-Qin
2015-01-01
Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts. PMID:26607834
NASA Astrophysics Data System (ADS)
Fu, Ji-Meng; Winchester, John W.
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida - the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop) - is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturated in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model describes the river water as mainly a mixture of components with compositions resembling fresh rain, aged rain, and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO 3-/SO 42- ratio, signifying an atmospheric source but with most of its nitrate having been lost or transformed. The groundwater component, inferred from its concentration to contribute on average about one fourth of the river water, contains abundant Ca 2+ but no detectable nitrogen. Results similar to ACF were obtained for Sop and Och, though Och exhibits some association of NO 3- with the Ca 2+-rich component. Similar APCA of wet precipitation resolves mainly components that represent acid rain, with NO 3-, SO 42- and NH 4+ and sea salt, with Na +, Cl - and Mg 2+. Inland, the acid rain component is relatively more prominent and Cl - is depleted, while at atmospheric monitoring sites nearer the coastal region sea salt tends to be more prominent.
Statistical Inference for Valued-Edge Networks: The Generalized Exponential Random Graph Model
Desmarais, Bruce A.; Cranmer, Skyler J.
2012-01-01
Across the sciences, the statistical analysis of networks is central to the production of knowledge on relational phenomena. Because of their ability to model the structural generation of networks based on both endogenous and exogenous factors, exponential random graph models are a ubiquitous means of analysis. However, they are limited by an inability to model networks with valued edges. We address this problem by introducing a class of generalized exponential random graph models capable of modeling networks whose edges have continuous values (bounded or unbounded), thus greatly expanding the scope of networks applied researchers can subject to statistical analysis. PMID:22276151
Statistical inference of selection and divergence of rice blast resistance gene Pi-ta
Technology Transfer Automated Retrieval System (TEKTRAN)
The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...
ERIC Educational Resources Information Center
Finch, Sue; Cumming, Geoff; Thomason, Neil
2001-01-01
Analyzed 150 articles from the "Journal of Applied Psychology" (JAP) from 1940 to 1999 to determine statistical reporting practices related to null hypothesis significance testing, American Psychological Association guidelines, and reform recommendations. Findings show little evidence that decades of cogent criticisms by reformers have resulted in…
Independence and statistical inference in clinical trial designs: a tutorial review.
Bolton, S
1998-05-01
The requirements for statistical approaches to the design, analysis, and interpretation of experimental data are now accepted by the scientific community. This is of particular importance in medical studies where public health consequences are of concern. Investigators in the clinical sciences should be cognizant of statistical principles in general, but should always be wary of the pursuing their own analyses and engage statisticians for data analysis whenever possible. Examples of circumstances that require statistical evaluation not found in textbooks and not always obvious to the lay person are pervasive. Incorrect statistical evaluation and analyses in such situations will result in erroneous and potentially serious misleading interpretation of clinical data. Although a statistician may not be responsible for any misinterpretations in such unfortunate circumstances, the quote often cited about statisticians and "damned liars" may appear to be more truth than fable. This article is a tutorial review and describes a common misuse of clinical data resulting in an apparently large sample size derived from a small number of patients. This mistake is a consequence of ignoring the dependency of results, treating multiple observations from a single patient as independent observations. PMID:9602951
Using Stimulus Equivalence Technology to Teach Statistical Inference in a Group Setting
ERIC Educational Resources Information Center
Critchfield, Thomas S.; Fienup, Daniel M.
2010-01-01
Computerized lessons employing stimulus equivalence technology, used previously under laboratory conditions to teach inferential statistics concepts to college students, were employed in a group setting for the first time. Students showed the same directly taught and emergent learning gains as in laboratory studies. A brief paper-and-pencil…
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Statistical inference for the additive hazards model under outcome-dependent sampling
Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo
2015-01-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363
Lekone, Phenyo E; Finkenstädt, Bärbel F
2006-12-01
A stochastic discrete-time susceptible-exposed-infectious-recovered (SEIR) model for infectious diseases is developed with the aim of estimating parameters from daily incidence and mortality time series for an outbreak of Ebola in the Democratic Republic of Congo in 1995. The incidence time series exhibit many low integers as well as zero counts requiring an intrinsically stochastic modeling approach. In order to capture the stochastic nature of the transitions between the compartmental populations in such a model we specify appropriate conditional binomial distributions. In addition, a relatively simple temporally varying transmission rate function is introduced that allows for the effect of control interventions. We develop Markov chain Monte Carlo methods for inference that are used to explore the posterior distribution of the parameters. The algorithm is further extended to integrate numerically over state variables of the model, which are unobserved. This provides a realistic stochastic model that can be used by epidemiologists to study the dynamics of the disease and the effect of control interventions. PMID:17156292
Fu, Ji-Meng; Winchester, J.W. )
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida-the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop)- is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturate in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model aged rain and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO[sup [minus][sub 3
NASA Astrophysics Data System (ADS)
Jha, Sanjeev Kumar; Comunian, Alessandro; Mariethoz, Gregoire; Kelly, Bryce F. J.
2014-10-01
We develop a stochastic approach to construct channelized 3-D geological models constrained to borehole measurements as well as geological interpretation. The methodology is based on simple 2-D geologist-provided sketches of fluvial depositional elements, which are extruded in the 3rd dimension. Multiple-point geostatistics (MPS) is used to impair horizontal variability to the structures by introducing geometrical transformation parameters. The sketches provided by the geologist are used as elementary training images, whose statistical information is expanded through randomized transformations. We demonstrate the applicability of the approach by applying it to modeling a fluvial valley filling sequence in the Maules Creek catchment, Australia. The facies models are constrained to borehole logs, spatial information borrowed from an analogue and local orientations derived from the present-day stream networks. The connectivity in the 3-D facies models is evaluated using statistical measures and transport simulations. Comparison with a statistically equivalent variogram-based model shows that our approach is more suited for building 3-D facies models that contain structures specific to the channelized environment and which have a significant influence on the transport processes.
Nonequilibrium statistical mechanics in one-dimensional bose gases
NASA Astrophysics Data System (ADS)
Baldovin, F.; Cappellaro, A.; Orlandini, E.; Salasnich, L.
2016-06-01
We study cold dilute gases made of bosonic atoms, showing that in the mean-field one-dimensional regime they support stable out-of-equilibrium states. Starting from the 3D Boltzmann–Vlasov equation with contact interaction, we derive an effective 1D Landau–Vlasov equation under the condition of a strong transverse harmonic confinement. We investigate the existence of out-of-equilibrium states, obtaining stability criteria similar to those of classical plasmas.
ERIC Educational Resources Information Center
Cui, Ying; Roberts, Mary Roduta
2013-01-01
The goal of this study was to investigate the usefulness of person-fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two-stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person-fit statistic, the…
Anderson, Eric C
2012-01-01
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs. PMID:23152426
A statistical formulation of one-dimensional electron fluid turbulence
NASA Technical Reports Server (NTRS)
Fyfe, D.; Montgomery, D.
1977-01-01
A one-dimensional electron fluid model is investigated using the mathematical methods of modern fluid turbulence theory. Non-dissipative equilibrium canonical distributions are determined in a phase space whose co-ordinates are the real and imaginary parts of the Fourier coefficients for the field variables. Spectral densities are calculated, yielding a wavenumber electric field energy spectrum proportional to k to the negative second power for large wavenumbers. The equations of motion are numerically integrated and the resulting spectra are found to compare well with the theoretical predictions.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints
Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher
2016-01-01
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674
NASA Astrophysics Data System (ADS)
Laloy, Eric; Linde, Niklas; Jacques, Diederik; Vrugt, Jasper A.
2015-06-01
We present a Bayesian inversion method for the joint inference of high-dimensional multi-Gaussian hydraulic conductivity fields and associated geostatistical parameters from indirect hydrological data. We combine Gaussian process generation via circulant embedding to decouple the variogram from grid cell specific values, with dimensionality reduction by interpolation to enable Markov chain Monte Carlo (MCMC) simulation. Using the Matérn variogram model, this formulation allows inferring the conductivity values simultaneously with the field smoothness (also called Matérn shape parameter) and other geostatistical parameters such as the mean, sill, integral scales and anisotropy direction(s) and ratio(s). The proposed dimensionality reduction method systematically honors the underlying variogram and is demonstrated to achieve better performance than the Karhunen-Loève expansion. We illustrate our inversion approach using synthetic (error corrupted) data from a tracer experiment in a fairly heterogeneous 10,000-dimensional 2-D conductivity field. A 40-times reduction of the size of the parameter space did not prevent the posterior simulations to appropriately fit the measurement data and the posterior parameter distributions to include the true geostatistical parameter values. Overall, the posterior field realizations covered a wide range of geostatistical models, questioning the common practice of assuming a fixed variogram prior to inference of the hydraulic conductivity values. Our method is shown to be more efficient than sequential Gibbs sampling (SGS) for the considered case study, particularly when implemented on a distributed computing cluster. It is also found to outperform the method of anchored distributions (MAD) for the same computational budget.
Statistical inference of co-movements of stocks during a financial crisis
NASA Astrophysics Data System (ADS)
Ibuki, Takero; Higano, Shunsuke; Suzuki, Sei; Inoue, Jun-ichi; Chakraborti, Anirban
2013-12-01
In order to figure out and to forecast the emergence phenomena of social systems, we propose several probabilistic models for the analysis of financial markets, especially around a crisis. We first attempt to visualize the collective behaviour of markets during a financial crisis through cross-correlations between typical Japanese daily stocks by making use of multidimensional scaling. We find that all the two-dimensional points (stocks) shrink into a single small region when a economic crisis takes place. By using the properties of cross-correlations in financial markets especially during a crisis, we next propose a theoretical framework to predict several time-series simultaneously. Our model system is basically described by a variant of the multi-layered Ising model with random fields as non-stationary time series. Hyper-parameters appearing in the probabilistic model are estimated by means of minimizing the 'cumulative error' in the past market history. The justification and validity of our approaches are numerically examined for several empirical data sets.
Statistical inference methods for recurrent event processes with shape and size parameters
WANG, MEI-CHENG; HUANG, CHIUNG-YU
2015-01-01
Summary This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation. PMID:26412863
Statistical validation of high-dimensional models of growing networks
NASA Astrophysics Data System (ADS)
Medo, Matúš
2014-03-01
The abundance of models of complex networks and the current insufficient validation standards make it difficult to judge which models are strongly supported by data and which are not. We focus here on likelihood maximization methods for models of growing networks with many parameters and compare their performance on artificial and real datasets. While high dimensionality of the parameter space harms the performance of direct likelihood maximization on artificial data, this can be improved by introducing a suitable penalization term. Likelihood maximization on real data shows that the presented approach is able to discriminate among available network models. To make large-scale datasets accessible to this kind of analysis, we propose a subset sampling technique and show that it yields substantial model evidence in a fraction of time necessary for the analysis of the complete data.
Kimura, S; Araki, D; Matsumura, K; Okada-Hatakeyama, M
2012-02-01
Voit and Almeida have proposed the decoupling approach as a method for inferring the S-system models of genetic networks. The decoupling approach defines the inference of a genetic network as a problem requiring the solutions of sets of algebraic equations. The computation can be accomplished in a very short time, as the approach estimates S-system parameters without solving any of the differential equations. Yet the defined algebraic equations are non-linear, which sometimes prevents us from finding reasonable S-system parameters. In this study, we propose a new technique to overcome this drawback of the decoupling approach. This technique transforms the problem of solving each set of algebraic equations into a one-dimensional function optimization problem. The computation can still be accomplished in a relatively short time, as the problem is transformed by solving a linear programming problem. We confirm the effectiveness of the proposed approach through numerical experiments. PMID:22155075
NASA Astrophysics Data System (ADS)
Hashim, Mohammad Firdaus Abu; Ramakrishnan, Sivakumar; Mohamad, Ahmad Azmin
2014-06-01
Due to low environmental impact and rechargeable capability, the Nickel Metal Hydride battery has been considered to be one of the most promising candidate battery for electrical vehicle nowadays. The energy delivered by the Nickel Metal Hydride battery depends heavily on its discharge profile and generally it is intangible to tract the trend of the energydissipation that is stored in the battery for informative analysis. The thermal models were developed in 1-dimensional and 2-dimensional using Matlab and these models are capable of predicting the temperature distributions inside a cell. The simulated results were validated and verified with referred exact sources of experimental data using Minitab software. The result for 1-Dimensional showed that the correlations between experimental and predicted results for the time intervals 60 minutes, 90 minutes, and 114 minutes frompositive to negative electrode thermal dissipationdirection are34%, 83%, and 94% accordingly while for the 2-Dimensional the correlational results for the same above time intervals are44%, 93% and 95%. These correlationalresults between experimental and predicted clearly indicating the thermal behavior under natural convention can be well fitted after around 90 minutes durational time and 2-Dimensional model can predict the results more accurately compared to 1-Dimensional model. Based on the results obtained from simulations, it can be concluded that both 1-Dimensional and 2-Dimensional models can predict nearly similar thermal behavior under natural convention while 2-Dimensional model was used to predict thermal behavior under forced convention for better accuracy.
Constrained statistical inference: sample-size tables for ANOVA and regression
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2015-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and this is known as an (order) constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a pre-specified power (say, 0.80) for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30–50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., β1 > β2) results in a higher power than assigning a positive or a negative sign to the parameters (e.g., β1 > 0). PMID:25628587
Can we infer the effect of river works on streamflow statistics?
NASA Astrophysics Data System (ADS)
Ganora, Daniele
2016-04-01
Most of our river network system is affected by anthropic pressure of different types. While climate and land use change are widely recognized as important factors, the effects of "in-line" water infrastructures on the global behavior of the river system is often overlooked. This is due to the difficulty in including local "physical" knowledge (e.g., the hydraulic behavior of a river reach with levees during a flood) into large-scale models that provide a statistical description of the streamflow, and which are the basis for the implementation of resources/risk management plans (e.g., regional models for prediction of the flood frequency curve). This work presents some preliminary applications regarding two widely used hydrological signatures, the flow duration curve and the flood frequency curve. We adopt a pragmatic (i.e., reliable and implementable at large scales) and parsimonious (i.e., that requires a few data) framework of analysis, considering that we operate in a complex system (many river work are already existing, and many others could be built in the future). In the first case, a method is proposed to correct observations of streamflow affected by the presence of upstream run-of-the-river power plants in order to provide the "natural" flow duration curve, using only simple information about the plant (i.e., the maximum intake flow). The second case regards the effects of flood-protection works on the downstream sections, to support the application of along-stream cost-benefit analysis in the flood risk management context. Current applications and possible future developments are discussed.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Soap film flows: Statistics of two-dimensional turbulence
Vorobieff, P.; Rivera, M.; Ecke, R.E.
1999-08-01
Soap film flows provide a very convenient laboratory model for studies of two-dimensional (2-D) hydrodynamics including turbulence. For a gravity-driven soap film channel with a grid of equally spaced cylinders inserted in the flow, we have measured the simultaneous velocity and thickness fields in the irregular flow downstream from the cylinders. The velocity field is determined by a modified digital particle image velocimetry method and the thickness from the light scattered by the particles in the film. From these measurements, we compute the decay of mean energy, enstrophy, and thickness fluctuations with downstream distance, and the structure functions of velocity, vorticity, thickness fluctuation, and vorticity flux. From these quantities we determine the microscale Reynolds number of the flow R{sub {lambda}}{approx}100 and the integral and dissipation scales of 2D turbulence. We also obtain quantitative measures of the degree to which our flow can be considered incompressible and isotropic as a function of downstream distance. We find coarsening of characteristic spatial scales, qualitative correspondence of the decay of energy and enstrophy with the Batchelor model, scaling of energy in {ital k} space consistent with the k{sup {minus}3} spectrum of the Kraichnan{endash}Batchelor enstrophy-scaling picture, and power-law scalings of the structure functions of velocity, vorticity, vorticity flux, and thickness. These results are compared with models of 2-D turbulence and with numerical simulations. {copyright} {ital 1999 American Institute of Physics.}
NASA Astrophysics Data System (ADS)
Ortega-Minakata, R. A.; Torres-Papaqui, J. P.; Andernach, H.; Islas-Islas, J. M.
2014-05-01
We quantify the statistical evidence of the relation between the inferred morphology and the emission-line activity type of galaxies for a large sample of galaxies. We compare the distribution of the inferred morphologies of galaxies of different dominant activity types, showing that the difference in the median morphological type between the samples of different activity types is significant. We also test the significance of the difference in the mean morphological type between all the activity-type samples using an ANOVA model with a modified Tukey test that takes into account heteroscedasticity and the unequal sample sizes. We show this test in the form of simultaneous confidence intervals for all pairwise comparisons of the mean morphological types of the samples. Using this test, scarcely applied in astronomy, we conclude that there are statistically significant differences in the inferred morphologies of galaxies of different dominant activity types.
NASA Astrophysics Data System (ADS)
von Nessi, G. T.; Hole, M. J.; The MAST Team
2014-11-01
We present recent results and technical breakthroughs for the Bayesian inference of tokamak equilibria using force-balance as a prior constraint. Issues surrounding model parameter representation and posterior analysis are discussed and addressed. These points motivate the recent advancements embodied in the Bayesian Equilibrium Analysis and Simulation Tool (BEAST) software being presently utilized to study equilibria on the Mega-Ampere Spherical Tokamak (MAST) experiment in the UK (von Nessi et al 2012 J. Phys. A 46 185501). State-of-the-art results of using BEAST to study MAST equilibria are reviewed, with recent code advancements being systematically presented though out the manuscript.
NASA Astrophysics Data System (ADS)
Speegle, Darrin; Steward, Robert
2015-08-01
We propose a semiparametric approach to infer the existence of and estimate the location of a statistical change-point to a nonlinear high dimensional time series contaminated with an additive noise component. In particular, we consider a p―dimensional stochastic process of independent multivariate normal observations where the mean function varies smoothly except at a single change-point. Our approach first involves a dimension reduction of the original time series through a random matrix multiplication. Next, we conduct a Bayesian analysis on the empirical detail coefficients of this dimensionally reduced time series after a wavelet transform. We also present a means to associate confidence bounds to the conclusions of our results. Aside from being computationally efficient and straight forward to implement, the primary advantage of our methods is seen in how these methods apply to a much larger class of time series whose mean functions are subject to only general smoothness conditions.
NASA Astrophysics Data System (ADS)
Pandarinath, Kailasa
2014-12-01
Several new multi-dimensional tectonomagmatic discrimination diagrams employing log-ratio variables of chemical elements and probability based procedure have been developed during the last 10 years for basic-ultrabasic, intermediate and acid igneous rocks. There are numerous studies on extensive evaluations of these newly developed diagrams which have indicated their successful application to know the original tectonic setting of younger and older as well as sea-water and hydrothermally altered volcanic rocks. In the present study, these diagrams were applied to Precambrian rocks of Mexico (southern and north-eastern) and Argentina. The study indicated the original tectonic setting of Precambrian rocks from the Oaxaca Complex of southern Mexico as follows: (1) dominant rift (within-plate) setting for rocks of 1117-988 Ma age; (2) dominant rift and less-dominant arc setting for rocks of 1157-1130 Ma age; and (3) a combined tectonic setting of collision and rift for Etla Granitoid Pluton (917 Ma age). The diagrams have indicated the original tectonic setting of the Precambrian rocks from the north-eastern Mexico as: (1) a dominant arc tectonic setting for the rocks of 988 Ma age; and (2) an arc and collision setting for the rocks of 1200-1157 Ma age. Similarly, the diagrams have indicated the dominant original tectonic setting for the Precambrian rocks from Argentina as: (1) with-in plate (continental rift-ocean island) and continental rift (CR) setting for the rocks of 800 Ma and 845 Ma age, respectively; and (2) an arc setting for the rocks of 1174-1169 Ma and of 1212-1188 Ma age. The inferred tectonic setting for these Precambrian rocks are, in general, in accordance to the tectonic setting reported in the literature, though there are some inconsistence inference of tectonic settings by some of the diagrams. The present study confirms the importance of these newly developed discriminant-function based diagrams in inferring the original tectonic setting of
A three-dimensional statistical mechanical model of folding double-stranded chain molecules
NASA Astrophysics Data System (ADS)
Zhang, Wenbing; Chen, Shi-Jie
2001-05-01
Based on a graphical representation of intrachain contacts, we have developed a new three-dimensional model for the statistical mechanics of double-stranded chain molecules. The theory has been tested and validated for the cubic lattice chain conformations. The statistical mechanical model can be applied to the equilibrium folding thermodynamics of a large class of chain molecules, including protein β-hairpin conformations and RNA secondary structures. The application of a previously developed two-dimensional model to RNA secondary structure folding thermodynamics generally overestimates the breadth of the melting curves [S-J. Chen and K. A. Dill, Proc. Natl. Acad. Sci. U.S.A. 97, 646 (2000)], suggesting an underestimation for the sharpness of the conformational transitions. In this work, we show that the new three-dimensional model gives much sharper melting curves than the two-dimensional model. We believe that the new three-dimensional model may give much improved predictions for the thermodynamic properties of RNA conformational changes than the previous two-dimensional model.
Measurement of two-dimensional optical system MTF by computation of second order speckle statistics
NASA Astrophysics Data System (ADS)
Lund, G.; Azouit, M.
1980-04-01
An interferometric approach to the calculation of the two-dimensional MTF of an optical system is proposed. The technique, in some ways analogous to that of speckle interferometry used in astronomical situations, is based on the computation of the second-order spatio-temporal statistics of a fluctuating speckle pattern. The theorum of Van Cittert-Zernike is invoked to relate the speckle, due to the illumination of a perfect diffuser by the point spread function of an optical system, to the two-dimensional MTF of the system. The computed MTF is displayed in the form of a contour map and can also be represented in the conventional form of a one-dimensional vertical cut. Preliminary measurements have yielded qualitatively useful results and clearly illustrate the suitability of two-dimensional maps for the detection of transfer function anisotropies.
Stadler, R.; Hellmann, J.; Schirle, M.; Beckmann, J.
1993-12-31
Based on on previous work where it was shown that 4-urazoyl benzoic acid groups (U4A), which were statistically attached to polybutadiene, form ordered supramolecular arrays in the polymer matrix. The present work describes the synthesis of a new molecular building block capable for self assembling in the unpolar matrix. 5-urazoylisophthalic acid groups (U35A) attached to 1,4-polybutadiene chains show an endothermic transition, characteristic for supramolecular self assembling. The melting temperature increases for low levels of modification from 130{degrees}C up to 190{degrees}C. The IR-data indicate than the 5-urazoylisophthalic acid groups are 4-functional with respect to supramolecular self-addressing. Based on the detailed knowledge of the structure of the self-assembled domains in 4-urazoyl benzoic acid groups, a model is developed which describes qualitatively the observed material properties.
Schwermann, Achim H; dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-01-01
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods. DOI: http://dx.doi.org/10.7554/eLife.12129.001 PMID:26854367
Schwermann, Achim H; Dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-01-01
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods. PMID:26854367
Maneuvering target tracking algorithm based on current statistical model in three dimensional space
NASA Astrophysics Data System (ADS)
Huang, Ligang; Yan, Kang; Wang, Xiangdong
2015-07-01
This paper is mainly to solve the problems associated with maneuvering target tracking based current statistical model in three dimensional space. Firstly, a three-dimensional model of the nine state variables is presented. Then adaptive Kalman filtering algorithm is designed with the motor acceleration data mean and variance. Finally, A simulation about the adaptive Kalman filtering put forward by this thesis and the direct calculation method is given, which aim at the maneuvering target in three-dimension. The results show the good performances such as better target position, velocity and acceleration estimates brought by the proposed approach by presenting and discussing the simulation results.
Full counting statistics of laser excited Rydberg aggregates in a one-dimensional geometry.
Schempp, H; Günter, G; Robert-de-Saint-Vincent, M; Hofmann, C S; Breyel, D; Komnik, A; Schönleber, D W; Gärttner, M; Evers, J; Whitlock, S; Weidemüller, M
2014-01-10
We experimentally study the full counting statistics of few-body Rydberg aggregates excited from a quasi-one-dimensional atomic gas. We measure asymmetric excitation spectra and increased second and third order statistical moments of the Rydberg number distribution, from which we determine the average aggregate size. Estimating rates for different excitation processes we conclude that the aggregates grow sequentially around an initial grain. Direct comparison with numerical simulations confirms this conclusion and reveals the presence of liquidlike spatial correlations. Our findings demonstrate the importance of dephasing in strongly correlated Rydberg gases and introduce a way to study spatial correlations in interacting many-body quantum systems without imaging. PMID:24483893
NASA Astrophysics Data System (ADS)
Yoshimitsu, Nana; Furumura, Takashi; Maeda, Takuto
2016-09-01
The coda part of a waveform transmitted through a laboratory sample should be examined for the high-resolution monitoring of the sample characteristics in detail. However, the origin and propagation process of the later phases in a finite-sized small sample are very complicated with the overlap of multiple unknown reflections and conversions. In this study, we investigated the three-dimensional (3D) geometric effect of a finite-sized cylindrical sample to understand the development of these later phases. This study used 3D finite difference method simulation employing a free-surface boundary condition over a curved model surface and a realistic circular shape of the source model. The simulated waveforms and the visualized 3D wavefield in a stainless steel sample clearly demonstrated the process of multiple reflections and the conversions of the P and S waves at the side surface as well as at the top and bottom of the sample. Rayleigh wave propagation along the curved side boundary was also confirmed, and these waves dominate in the later portion of the simulated waveform with much larger amplitudes than the P and S wave reflections. The feature of the simulated waveforms showed good agreement with laboratory observed waveforms. For the simulation, an introduction of an absorbing boundary condition at the top and bottom of the sample made it possible to efficiently separate the contribution of the vertical and horizontal boundary effects in the simulated wavefield. This procedure helped to confirm the additional finding of vertically propagating multiple surface waves and their conversion at the corner of the sample. This new laboratory-scale 3D simulation enabled the appearance of a variety of geometric effects that constitute the later phases of the transmitted waves.
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed. PMID:17286092
Carlsen, Michelle; Fu, Guifang; Bushman, Shaun; Corcoran, Christopher
2016-02-01
Genome-wide data with millions of single-nucleotide polymorphisms (SNPs) can be highly correlated due to linkage disequilibrium (LD). The ultrahigh dimensionality of big data brings unprecedented challenges to statistical modeling such as noise accumulation, the curse of dimensionality, computational burden, spurious correlations, and a processing and storing bottleneck. The traditional statistical approaches lose their power due to [Formula: see text] (n is the number of observations and p is the number of SNPs) and the complex correlation structure among SNPs. In this article, we propose an integrated distance correlation ridge regression (DCRR) approach to accommodate the ultrahigh dimensionality, joint polygenic effects of multiple loci, and the complex LD structures. Initially, a distance correlation (DC) screening approach is used to extensively remove noise, after which LD structure is addressed using a ridge penalized multiple logistic regression (LRR) model. The false discovery rate, true positive discovery rate, and computational cost were simultaneously assessed through a large number of simulations. A binary trait of Arabidopsis thaliana, the hypersensitive response to the bacterial elicitor AvrRpm1, was analyzed in 84 inbred lines (28 susceptibilities and 56 resistances) with 216,130 SNPs. Compared to previous SNP discovery methods implemented on the same data set, the DCRR approach successfully detected the causative SNP while dramatically reducing spurious associations and computational time. PMID:26661113
Application of Edwards' statistical mechanics to high-dimensional jammed sphere packings.
Jin, Yuliang; Charbonneau, Patrick; Meyer, Sam; Song, Chaoming; Zamponi, Francesco
2010-11-01
The isostatic jamming limit of frictionless spherical particles from Edwards' statistical mechanics [Song et al., Nature (London) 453, 629 (2008)] is generalized to arbitrary dimension d using a liquid-state description. The asymptotic high-dimensional behavior of the self-consistent relation is obtained by saddle-point evaluation and checked numerically. The resulting random close packing density scaling ϕ∼d2(-d) is consistent with that of other approaches, such as replica theory and density-functional theory. The validity of various structural approximations is assessed by comparing with three- to six-dimensional isostatic packings obtained from simulations. These numerical results support a growing accuracy of the theoretical approach with dimension. The approach could thus serve as a starting point to obtain a geometrical understanding of the higher-order correlations present in jammed packings. PMID:21230456
NASA Astrophysics Data System (ADS)
Rotter, Stefan; Aigner, Florian; Burgdörfer, Joachim
2007-03-01
We investigate the statistical distribution of transmission eigenvalues in phase-coherent transport through quantum dots. In two-dimensional ab initio simulations for both clean and disordered two-dimensional cavities, we find markedly different quantum-to-classical crossover scenarios for these two cases. In particular, we observe the emergence of “noiseless scattering states” in clean cavities, irrespective of sharp-edged entrance and exit lead mouths. We find the onset of these “classical” states to be largely independent of the cavity’s classical chaoticity, but very sensitive with respect to bulk disorder. Our results suggest that for weakly disordered cavities, the transmission eigenvalue distribution is determined both by scattering at the disorder potential and the cavity walls. To properly account for this intermediate parameter regime, we introduce a hybrid crossover scheme, which combines previous models that are valid in the ballistic and the stochastic limit, respectively.
NASA Astrophysics Data System (ADS)
Yeom, Seokwon; Lee, Dongsu; Son, Jung-Young; Kim, Shin-Hwan
2009-09-01
In this paper, we discuss computational reconstruction and statistical pattern classification using integral imaging. Three-dimensional object information is numerically reconstructed at arbitrary depth-levels by averaging the corresponding pixels. The longitudinal distance and object boundary are estimated where the standard deviation of the intensity is minimized. Fisher linear discriminant analysis combined with principal component analysis is adopted for the classification of out-of-plane rotated objects. The Fisher linear discriminant analysis maximizes the class-discrimination while the principal component analysis minimizes the error between the original and the restored images. The presented method provides promising results for the distortion-tolerant pattern classification.
Nonextensive statistics, entropic gravity and gravitational force in a non-integer dimensional space
NASA Astrophysics Data System (ADS)
Abreu, Everton M. C.; Neto, Jorge Ananias; Godinho, Cresus F. L.
2014-10-01
Based on the connection between Tsallis nonextensive statistics and fractional dimensional space, in this work we have introduced, with the aid of Verlinde's formalism, the Newton constant in a fractal space as a function of the nonextensive constant. With this result we have constructed a curve that shows the direct relation between Tsallis nonextensive parameter and the dimension of this fractal space. We have demonstrated precisely that there are ambiguities between the results due to Verlinde's approach and the ones due to fractional calculus formalism. We have shown precisely that these ambiguities appear only for spaces with dimensions different from three. A possible solution for this ambiguity was proposed here.
Three-Dimensional Statistical Gas Distribution Mapping in an Uncontrolled Indoor Environment
Reggente, Matteo; Lilienthal, Achim J.
2009-05-23
In this paper we present a statistical method to build three-dimensional gas distribution maps (3D-DM). The proposed mapping technique uses kernel extrapolation with a tri-variate Gaussian kernel that models the likelihood that a reading represents the concentration distribution at a distant location in the three dimensions. The method is evaluated using a mobile robot equipped with three 'e-noses' mounted at different heights. Initial experiments in an uncontrolled indoor environment are presented and evaluated with respect to the ability of the 3D map, computed from the lower and upper nose, to predict the map from the middle nose.
Aggelopoulos, Nikolaos C
2015-08-01
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. PMID:25976632
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
Loop braiding statistics in exactly soluble three-dimensional lattice models
NASA Astrophysics Data System (ADS)
Lin, Chien-Hung; Levin, Michael
2015-07-01
We construct two exactly soluble lattice spin models that demonstrate the importance of three-loop braiding statistics for the classification of three-dimensional gapped quantum phases. The two models are superficially similar: both are gapped and both support particlelike and looplike excitations similar to those of charges and vortex lines in a Z2×Z2 gauge theory. Furthermore, in both models the particle excitations are bosons, and in both models the particle and loop excitations have the same mutual braiding statistics. The difference between the two models is only apparent when one considers the recently proposed three-loop braiding process in which one loop is braided around another while both are linked to a third loop. We find that the statistical phase associated with this process is different in the two models, thus proving that they belong to two distinct phases. An important feature of this work is that we derive our results using a concrete approach: we construct string and membrane operators that create and move the particle and loop excitations and then we extract the braiding statistics from the commutation algebra of these operators.
Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis
Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel
2016-01-01
An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different case studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.
ERIC Educational Resources Information Center
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
A Study of the Statistical Inference Criteria: Can We Agree on When to Use Z versus "t"?
ERIC Educational Resources Information Center
Ozgur, Ceyhun; Strasser, Sandra E.
2004-01-01
Authors who write introductory business statistics texts do not agree on when to use a t distribution and when to use a Z distribution in both the construction of confidence intervals and the use of hypothesis testing. In a survey of textbooks written in the last 15 years, we found the decision rules to be contradictory and, at times, the…
ERIC Educational Resources Information Center
Davis, Philip M.; Solla, Leah R.
2003-01-01
Reports an analysis of American Chemical Society electronic journal downloads at Cornell University (Ithaca, New York) by individual IP (Internet Protocol) addresses. Highlights include usage statistics to evaluate library journal subscriptions; understanding scientists' reading behavior; individual use of articles and of journals; and the…
Kravtsov, V.E.; Yudson, V.I.
2011-07-15
Highlights: > Statistics of normalized eigenfunctions in one-dimensional Anderson localization at E = 0 is studied. > Moments of inverse participation ratio are calculated. > Equation for generating function is derived at E = 0. > An exact solution for generating function at E = 0 is obtained. > Relation of the generating function to the phase distribution function is established. - Abstract: The one-dimensional (1d) Anderson model (AM), i.e. a tight-binding chain with random uncorrelated on-site energies, has statistical anomalies at any rational point f=(2a)/({lambda}{sub E}) , where a is the lattice constant and {lambda}{sub E} is the de Broglie wavelength. We develop a regular approach to anomalous statistics of normalized eigenfunctions {psi}(r) at such commensurability points. The approach is based on an exact integral transfer-matrix equation for a generating function {Phi}{sub r}(u, {phi}) (u and {phi} have a meaning of the squared amplitude and phase of eigenfunctions, r is the position of the observation point). This generating function can be used to compute local statistics of eigenfunctions of 1d AM at any disorder and to address the problem of higher-order anomalies at f=p/q with q > 2. The descender of the generating function P{sub r}({phi}){identical_to}{Phi}{sub r}(u=0,{phi}) is shown to be the distribution function of phase which determines the Lyapunov exponent and the local density of states. In the leading order in the small disorder we derived a second-order partial differential equation for the r-independent ('zero-mode') component {Phi}(u, {phi}) at the E = 0 (f=1/2 ) anomaly. This equation is nonseparable in variables u and {phi}. Yet, we show that due to a hidden symmetry, it is integrable and we construct an exact solution for {Phi}(u, {phi}) explicitly in quadratures. Using this solution we computed moments I{sub m} = N< vertical bar {psi} vertical bar {sup 2m}> (m {>=} 1) for a chain of the length N {yields} {infinity} and found an
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Statistics of particle transport in a two-dimensional dusty plasma cluster
Ratynskaia, S.; Knapek, C.; Rypdal, K.; Khrapak, S.; Morfill, G.
2005-02-01
Statistical analysis is performed on long time series of dust particle trajectories in a two-dimensional dusty plasma cluster. Particle transport is found to be superdiffusive on all time scales until the range of particle displacements approaches the size of the cluster. Analysis of probability distribution functions and rescaled range analysis of the position increments show that the signal is non-Gaussian self-similar with Hurst exponent H=0.6, indicating that the superdiffusion is caused by long-range dependencies in the system. Investigation of temporal and spatial characteristics of persistent particle slips demonstrates that they are associated with collective events present on all time scales and responsible for the non-Gaussianity and long-memory effects.
Collisional statistics and dynamics of two-dimensional hard-disk systems: From fluid to solid.
Taloni, Alessandro; Meroz, Yasmine; Huerta, Adrián
2015-08-01
We perform extensive MD simulations of two-dimensional systems of hard disks, focusing on the collisional statistical properties. We analyze the distribution functions of velocity, free flight time, and free path length for packing fractions ranging from the fluid to the solid phase. The behaviors of the mean free flight time and path length between subsequent collisions are found to drastically change in the coexistence phase. We show that single-particle dynamical properties behave analogously in collisional and continuous-time representations, exhibiting apparent crossovers between the fluid and the solid phases. We find that, both in collisional and continuous-time representation, the mean-squared displacement, velocity autocorrelation functions, intermediate scattering functions, and self-part of the van Hove function (propagator) closely reproduce the same behavior exhibited by the corresponding quantities in granular media, colloids, and supercooled liquids close to the glass or jamming transition. PMID:26382368
Statistical mechanics of two-dimensional foams: Physical foundations of the model.
Durand, Marc
2015-12-01
In a recent series of papers, a statistical model that accounts for correlations between topological and geometrical properties of a two-dimensional shuffled foam has been proposed and compared with experimental and numerical data. Here, the various assumptions on which the model is based are exposed and justified: the equiprobability hypothesis of the foam configurations is argued. The range of correlations between bubbles is discussed, and the mean-field approximation that is used in the model is detailed. The two self-consistency equations associated with this mean-field description can be interpreted as the conservation laws of number of sides and bubble curvature, respectively. Finally, the use of a "Grand-Canonical" description, in which the foam constitutes a reservoir of sides and curvature, is justified. PMID:26701712
NASA Astrophysics Data System (ADS)
Villeta, M.; Sanz-Lobera, A.; González, C.; Sebastián, M. A.
2009-11-01
The implantation of Statistical Process Control, SPC designated in short, requires the use of measurement systems. The inherent variability of these systems influences on the reliability of measurement results obtained, and as a consequence of it, influences on the SPC results. This paper investigates about the influence of the uncertainty of measurement on the analysis of process capability. It looks for reducing the effect of measurement uncertainty, to approach the capability that the productive process really has. In this work processes centered at a nominal value as well as off-center processes are raised, and a criterion is proposed that allows validate the adequacy of the dimensional measurement systems used in a SPC implantation.
NASA Astrophysics Data System (ADS)
Das Sarma, S.; Nag, Amit; Sau, Jay D.
2016-07-01
We consider a simple conceptual question with respect to Majorana zero modes in semiconductor nanowires: can the measured nonideal values of the zero-bias-conductance-peak in the tunneling experiments be used as a characteristic to predict the underlying topological nature of the proximity induced nanowire superconductivity? In particular, we define and calculate the topological visibility, which is a variation of the topological invariant associated with the scattering matrix of the system as well as the zero-bias-conductance-peak heights in the tunneling measurements, in the presence of dissipative broadening, using precisely the same realistic nanowire parameters to connect the topological invariants with the zero-bias tunneling conductance values. This dissipative broadening is present in both (the existing) tunneling measurements and also (any future) braiding experiments as an inevitable consequence of a finite braiding time. The connection between the topological visibility and the conductance allows us to obtain the visibility of realistic braiding experiments in nanowires, and to conclude that the current experimentally accessible systems with nonideal zero-bias conductance peaks may indeed manifest (with rather low visibility) non-Abelian statistics for the Majorana zero modes. In general, we find that a large (small) superconducting gap (Majorana peak splitting) is essential for the manifestation of the non-Abelian braiding statistics, and in particular, a zero-bias conductance value of around half the ideal quantized Majorana value should be sufficient for the manifestation of non-Abelian statistics in experimental nanowires. Our work also establishes that as a matter of principle the topological transition associated with the emergence of Majorana zero modes in finite nanowires is always a crossover (akin to a quantum phase transition at finite temperature) requiring the presence of dissipative broadening (which must be larger than the Majorana energy
Statistical conservation law in two- and three-dimensional turbulent flows
NASA Astrophysics Data System (ADS)
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013), 10.1103/PhysRevLett.110.214502] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d -dimensional flow the distance R (t ) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012), 10.1016/j.physd.2011.07.008], finding that although it probes small scales they are not in the smooth regime. Thus instead of
Current Sheet Statistics in Three-Dimensional Simulations of Coronal Heating
NASA Astrophysics Data System (ADS)
Lin, L.; Ng, C. S.; Bhattacharjee, A.
2013-04-01
In a recent numerical study [Ng et al., Astrophys. J. 747, 109, 2012], with a three-dimensional model of coronal heating using reduced magnetohydrodynamics (RMHD), we have obtained scaling results of heating rate versus Lundquist number based on a series of runs in which random photospheric motions are imposed for hundreds to thousands of Alfvén time in order to obtain converged statistical values. The heating rate found in these simulations saturate to a level that is independent of the Lundquist number. This scaling result was also supported by an analysis with the assumption of the Sweet-Parker scaling of the current sheets, as well as how the width, length and number of current sheets scale with Lundquist number. In order to test these assumptions, we have implemented an automated routine to analyze thousands of current sheets in these simulations and return statistical scalings for these quantities. It is found that the Sweet-Parker scaling is justified. However, some discrepancies are also found and require further study.
NASA Astrophysics Data System (ADS)
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2015-05-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
NASA Astrophysics Data System (ADS)
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2016-07-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
NASA Astrophysics Data System (ADS)
Chavanis, Pierre-Henri
2014-04-01
We complement the literature on the statistical mechanics of point vortices in two-dimensional hydrodynamics. Using a maximum entropy principle, we determine the multi-species Boltzmann-Poisson equation and establish a form of Virial theorem. Using a maximum entropy production principle (MEPP), we derive a set of relaxation equations towards statistical equilibrium. These relaxation equations can be used as a numerical algorithm to compute the maximum entropy state. We mention the analogies with the Fokker-Planck equations derived by Debye and Hückel for electrolytes. We then consider the limit of strong mixing (or low energy). To leading order, the relationship between the vorticity and the stream function at equilibrium is linear and the maximization of the entropy becomes equivalent to the minimization of the enstrophy. This expansion is similar to the Debye-Hückel approximation for electrolytes, except that the temperature is negative instead of positive so that the effective interaction between like-sign vortices is attractive instead of repulsive. This leads to an organization at large scales presenting geometry-induced phase transitions, instead of Debye shielding. We compare the results obtained with point vortices to those obtained in the context of the statistical mechanics of continuous vorticity fields described by the Miller-Robert-Sommeria (MRS) theory. At linear order, we get the same results but differences appear at the next order. In particular, the MRS theory predicts a transition between sinh and tanh-like ω - ψ relationships depending on the sign of Ku - 3 (where Ku is the Kurtosis) while there is no such transition for point vortices which always show a sinh-like ω - ψ relationship. We derive the form of the relaxation equations in the strong mixing limit and show that the enstrophy plays the role of a Lyapunov functional.
Lagrangian statistics and flow topology in forced two-dimensional turbulence.
Kadoch, B; Del-Castillo-Negrete, D; Bos, W J T; Schneider, K
2011-03-01
A study of the relationship between Lagrangian statistics and flow topology in fluid turbulence is presented. The topology is characterized using the Weiss criterion, which provides a conceptually simple tool to partition the flow into topologically different regions: elliptic (vortex dominated), hyperbolic (deformation dominated), and intermediate (turbulent background). The flow corresponds to forced two-dimensional Navier-Stokes turbulence in doubly periodic and circular bounded domains, the latter with no-slip boundary conditions. In the double periodic domain, the probability density function (pdf) of the Weiss field exhibits a negative skewness consistent with the fact that in periodic domains the flow is dominated by coherent vortex structures. On the other hand, in the circular domain, the elliptic and hyperbolic regions seem to be statistically similar. We follow a Lagrangian approach and obtain the statistics by tracking large ensembles of passively advected tracers. The pdfs of residence time in the topologically different regions are computed introducing the Lagrangian Weiss field, i.e., the Weiss field computed along the particles' trajectories. In elliptic and hyperbolic regions, the pdfs of the residence time have self-similar algebraic decaying tails. In contrast, in the intermediate regions the pdf has exponential decaying tails. The conditional pdfs (with respect to the flow topology) of the Lagrangian velocity exhibit Gaussian-like behavior in the periodic and in the bounded domains. In contrast to the freely decaying turbulence case, the conditional pdfs of the Lagrangian acceleration in forced turbulence show a comparable level of intermittency in both the periodic and the bounded domains. The conditional pdfs of the Lagrangian curvature are characterized, in all cases, by self-similar power-law behavior with a decay exponent of order -2. PMID:21517594
NASA Astrophysics Data System (ADS)
Yoshimatsu, Katsunori; Schneider, Kai; Okamoto, Naoya; Kawahara, Yasuhiro; Farge, Marie
2011-09-01
Scale-dependent and geometrical statistics of three-dimensional incompressible homogeneous magnetohydrodynamic turbulence without mean magnetic field are examined by means of the orthogonal wavelet decomposition. The flow is computed by direct numerical simulation with a Fourier spectral method at resolution 5123 and a unit magnetic Prandtl number. Scale-dependent second and higher order statistics of the velocity and magnetic fields allow to quantify their intermittency in terms of spatial fluctuations of the energy spectra, the flatness, and the probability distribution functions at different scales. Different scale-dependent relative helicities, e.g., kinetic, cross, and magnetic relative helicities, yield geometrical information on alignment between the different scale-dependent fields. At each scale, the alignment between the velocity and magnetic field is found to be more pronounced than the other alignments considered here, i.e., the scale-dependent alignment between the velocity and vorticity, the scale-dependent alignment between the magnetic field and its vector potential, and the scale-dependent alignment between the magnetic field and the current density. Finally, statistical scale-dependent analyses of both Eulerian and Lagrangian accelerations and the corresponding time-derivatives of the magnetic field are performed. It is found that the Lagrangian acceleration does not exhibit substantially stronger intermittency compared to the Eulerian acceleration, in contrast to hydrodynamic turbulence where the Lagrangian acceleration shows much stronger intermittency than the Eulerian acceleration. The Eulerian time-derivative of the magnetic field is more intermittent than the Lagrangian time-derivative of the magnetic field.
NASA Astrophysics Data System (ADS)
Abramson, Louis Evan; Imacs Cluster Building Survey
2015-01-01
The growth of galaxies is a central theme of the cosmological narrative, but we do not yet understand how these objects build their stellar populations over time. Largely, this is because star formation histories must be inferred from statistical metrics (at z > 0), e.g., the cosmic star formation rate density, the stellar mass function, and the SFR/stellar mass relation. The relationship between these observations and the behavior of individual systems is unclear, but it deeply affects views on galaxy evolution. Here, I discuss key issues complicating this relationship, and explore attempts to deal with them from both 'population-down' and 'galaxy-up' perspectives. I suggest that these interpretations ultimately differ in their emphasis on astrophysical processes that 'quench' versus those that diversify galaxies, and the extent to which individual star formation histories encode these processes. I close by highlighting observations which might soon reveal the accuracy of either vision.
Statistics of extreme waves in the framework of one-dimensional Nonlinear Schrodinger Equation
NASA Astrophysics Data System (ADS)
Agafontsev, Dmitry; Zakharov, Vladimir
2013-04-01
We examine the statistics of extreme waves for one-dimensional classical focusing Nonlinear Schrodinger (NLS) equation, iÎ¨t + Î¨xx + |Î¨ |2Î¨ = 0, (1) as well as the influence of the first nonlinear term beyond Eq. (1) - the six-wave interactions - on the statistics of waves in the framework of generalized NLS equation accounting for six-wave interactions, dumping (linear dissipation, two- and three-photon absorption) and pumping terms, We solve these equations numerically in the box with periodically boundary conditions starting from the initial data Î¨t=0 = F(x) + ?(x), where F(x) is an exact modulationally unstable solution of Eq. (1) seeded by stochastic noise ?(x) with fixed statistical properties. We examine two types of initial conditions F(x): (a) condensate state F(x) = 1 for Eq. (1)-(2) and (b) cnoidal wave for Eq. (1). The development of modulation instability in Eq. (1)-(2) leads to formation of one-dimensional wave turbulence. In the integrable case the turbulence is called integrable and relaxes to one of infinite possible stationary states. Addition of six-wave interactions term leads to appearance of collapses that eventually are regularized by the dumping terms. The energy lost during regularization of collapses in (2) is restored by the pumping term. In the latter case the system does not demonstrate relaxation-like behavior. We measure evolution of spectra Ik =< |Î¨k|2 >, spatial correlation functions and the PDFs for waves amplitudes |Î¨|, concentrating special attention on formation of "fat tails" on the PDFs. For the classical integrable NLS equation (1) with condensate initial condition we observe Rayleigh tails for extremely large waves and a "breathing region" for middle waves with oscillations of the frequency of waves appearance with time, while nonintegrable NLS equation with dumping and pumping terms (2) with the absence of six-wave interactions α = 0 demonstrates perfectly Rayleigh PDFs without any oscillations with
NASA Astrophysics Data System (ADS)
Germa, Aurelie; Connor, Laura; Connor, Chuck; Malservisi, Rocco
2015-04-01
One challenge of volcanic hazard assessment in distributed volcanic fields (large number of small-volume basaltic volcanoes along with one or more silicic central volcanoes) is to constrain the location of future activity. Although the extent of the source of melts at depth can be known using geophysical methods or the location of past eruptive vents, the location of preferential pathways and zones of higher magma flux are still unobserved. How does the spatial distribution of eruptive vents at the surface reveal the location of magma sources or focusing? When this distribution is investigated, the location of central polygenetic edifices as well as clusters of monogenetic volcanoes denote zones of high magma flux and recurrence rate, whereas areas of dispersed monogenetic vents represent zones of lower flux. Additionally, central polygenetic edifices, acting as magma filters, prevent dense mafic magmas from reaching the surface close to their central silicic system. Subsequently, the spatial distribution of mafic monogenetic vents may provide clues to the subsurface structure of a volcanic field, such as the location of magma sources, preferential magma pathways, and flux distribution across the field. Gathering such data is of highly importance in improving the assessment of volcanic hazards. We are developing a modeling framework that compares output of statistical models of vent distribution with outputs form numerical models of subsurface magma transport. Geologic data observed at the Earth's surface are used to develop statistical models of spatial intensity (vents per unit area), volume intensity (erupted volume per unit area) and volume-flux intensity (erupted volume per unit time and area). Outputs are in the form of probability density functions assumed to represent volcanic flow output at the surface. These are then compared to outputs from conceptual models of the subsurface processes of magma storage and transport. These models are using Darcy's law
NASA Astrophysics Data System (ADS)
Onishi, Ryo; Vassilicos, J. C.
2014-11-01
This study investigates the collision statistics of inertial particles in inverse-cascading 2D homogeneous isotropic turbulence by means of a direct numerical simulation (DNS). A collision kernel model for particles with small Stokes number (St) in 2D flows is proposed based on the model of Saffman & Turner (1956) (ST56 model). The DNS results agree with this 2D version of the ST56 model for St < 0.1. It is then confirmed that our DNS results satisfy the 2D version of the spherical formulation of the collision kernel. The fact that the flatness factor stays around 3 in our 2D flow confirms that the present 2D turbulent flow is nearly intermittency-free. Collision statistics for St = 0.1, 0.4 and 0.6, i.e. for St <1, are obtained from the present 2D DNS and compared with those obtained from the three-dimensional (3D) DNS of Onishi et al. (2013). We have observed that the 3D radial distribution function at contact (g(R), the so-called clustering effect) decreases for St = 0.4 and 0.6 with increasing Reynolds number, while the 2D g(R) does not show a significant dependence on Reynolds number. This observation supports the view that the Reynolds-number dependence of g(R) observed in three dimensions is due to internal intermittency of the 3D turbulence. We have further investigated the local St, which is a function of the local flow strain rates, and proposed a plausible mechanism that can explain the Reynolds-number dependence of g(R).
A statistical mechanical theory for a two-dimensional model of water
NASA Astrophysics Data System (ADS)
Urbic, Tomaz; Dill, Ken A.
2010-06-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
A statistical mechanical theory for a two-dimensional model of water.
Urbic, Tomaz; Dill, Ken A
2010-06-14
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state. PMID:20550408
A three-dimensional statistical approach to improved image quality for multislice helical CT.
Thibault, Jean-Baptiste; Sauer, Ken D; Bouman, Charles A; Hsieh, Jiang
2007-11-01
Multislice helical computed tomography scanning offers the advantages of faster acquisition and wide organ coverage for routine clinical diagnostic purposes. However, image reconstruction is faced with the challenges of three-dimensional cone-beam geometry, data completeness issues, and low dosage. Of all available reconstruction methods, statistical iterative reconstruction (IR) techniques appear particularly promising since they provide the flexibility of accurate physical noise modeling and geometric system description. In this paper, we present the application of Bayesian iterative algorithms to real 3D multislice helical data to demonstrate significant image quality improvement over conventional techniques. We also introduce a novel prior distribution designed to provide flexibility in its parameters to fine-tune image quality. Specifically, enhanced image resolution and lower noise have been achieved, concurrently with the reduction of helical cone-beam artifacts, as demonstrated by phantom studies. Clinical results also illustrate the capabilities of the algorithm on real patient data. Although computational load remains a significant challenge for practical development, superior image quality combined with advancements in computing technology make IR techniques a legitimate candidate for future clinical applications. PMID:18072519
Large Deviations of Radial Statistics in the Two-Dimensional One-Component Plasma
NASA Astrophysics Data System (ADS)
Cunden, Fabio Deelan; Mezzadri, Francesco; Vivo, Pierpaolo
2016-07-01
The two-dimensional one-component plasma is an ubiquitous model for several vortex systems. For special values of the coupling constant β q^2 (where q is the particles charge and β the inverse temperature), the model also corresponds to the eigenvalues distribution of normal matrix models. Several features of the system are discussed in the limit of large number N of particles for generic values of the coupling constant. We show that the statistics of a class of radial observables produces a rich phase diagram, and their asymptotic behaviour in terms of large deviation functions is calculated explicitly, including next-to-leading terms up to order 1 / N. We demonstrate a split-off phenomenon associated to atypical fluctuations of the edge density profile. We also show explicitly that a failure of the fluid phase assumption of the plasma can break a genuine 1 / N-expansion of the free energy. Our findings are corroborated by numerical comparisons with exact finite-N formulae valid for β q^2=2.
Local Packing Fraction Statistics in a Two-Dimensional Granular Media
NASA Astrophysics Data System (ADS)
Puckett, James; Lechenault, Frederic; Daniels, Karen
2010-03-01
We experimentally investigate local packing fraction statistics of a two-dimensional bidisperse granular material supported by a horizontal air table and rearranged under impulses from the boundary. Our apparatus permits investigation of dense liquids close to the jamming transition under either constant pressure (CP) or constant volume (CV) boundary conditions and three different coefficients of friction. We calculate the probability distribution of the local packing fraction φ using both radical Voronoi tessellations (φV) and the Central Limit Theorem (φCLT). The two distributions have the same mean: <φV>=<φCLT>. For both methods, we observe that the variance strictly decreases as the mean increases; the functional dependence reveals information about the system. The variance of φV is larger under CP than CV, as expected since the cell volumes adjust to fluctuations in global volume. Interestingly, this feature is missing from φCLT. Instead, the variance of φCLT is sensitive to the internal friction of the system.
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA.
Tesoro, S; Ali, I; Morozov, A N; Sulaiman, N; Marenduzzo, D
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called '10 nm chromatin fibre', where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro. PMID:26871546
NASA Astrophysics Data System (ADS)
Lupu-Sax, Adam; Smolyarenko, Igor; Kaplan, Lev; Heller, Eric
1998-03-01
Recent theoretical work on statistics of local eigenstate intensities in weakly disordered two-dimensional metals(Falko & Efetov, PRB, 1995, 52(24), 17413-29; Smolyarenko & Altshuler, PRB, 1997, 55(16), 10451-66) predicts a logarithmically normal form of the distribution of local eigenstate intensities |ψ|^2 in the asymptotic region L^2|ψ|^2>>l/ln L, where the mean free path l and the system size L are measured in the units of the wavelength λ. We use a new scattering theory method(Lupu-Sax & Heller, talk in session 38c, paper in preparation) to find and compute eigenstates numerically at high speed which allows us to investigate previously inaccessible tails of the distribution function. We observe the log-normal form of the far asymptotic region of the distribution function of |ψ|^2 in the model of a single particle moving in the potential formed by randomly placed pointlike scatterers in a 2D integrable or chaotic billiard. We study the parameters of the log-normal distribution as functions of l and L and analyze the spatial structure of ``anomalous'' wavefunctions (those with a value of |ψ|^2 satisfying the above inequality somewhere in the sample), as well as the scatterer arrangements which produce them. The results are compared to theoretical predictions^1,(Mirlin, J. Math. Phys., 1997, 38(4), 1888-917).
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA
NASA Astrophysics Data System (ADS)
Tesoro, S.; Ali, I.; Morozov, A. N.; Sulaiman, N.; Marenduzzo, D.
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called ‘10 nm chromatin fibre’, where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas [1]. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.
Large Deviations of Radial Statistics in the Two-Dimensional One-Component Plasma
NASA Astrophysics Data System (ADS)
Cunden, Fabio Deelan; Mezzadri, Francesco; Vivo, Pierpaolo
2016-09-01
The two-dimensional one-component plasma is an ubiquitous model for several vortex systems. For special values of the coupling constant β q^2 (where q is the particles charge and β the inverse temperature), the model also corresponds to the eigenvalues distribution of normal matrix models. Several features of the system are discussed in the limit of large number N of particles for generic values of the coupling constant. We show that the statistics of a class of radial observables produces a rich phase diagram, and their asymptotic behaviour in terms of large deviation functions is calculated explicitly, including next-to-leading terms up to order 1 / N. We demonstrate a split-off phenomenon associated to atypical fluctuations of the edge density profile. We also show explicitly that a failure of the fluid phase assumption of the plasma can break a genuine 1 / N-expansion of the free energy. Our findings are corroborated by numerical comparisons with exact finite- N formulae valid for β q^2=2.
NASA Astrophysics Data System (ADS)
Iizumi, T.; Nishimori, M.; Yokozawa, M.; Kotera, A.; Khang, N. D.
2008-12-01
Long-term daily global solar radiation (GSR) data of the same quality in the 20th century has been needed as a baseline to assess the climate change impact on paddy rice production in Vietnamese Mekong Delta area (MKD: 104.5-107.5oE/8.2-11.2oN). However, though sunshine duration data is available, the accessibility of GSR data is quite poor in MKD. This study estimated the daily GSR in MKD for 30-yr (1978- 2007) by applying the statistical downscaling method (SDM). The estimates of GSR was obtained from four different sources: (1) the combined equations with the corrected reanalysis data of daily maximum/minimum temperatures, relative humidity, sea level pressure, and precipitable water; (2) the correction equation with the reanalysis data of downward shortwave radiation; (3) the empirical equation with the observed sunshine duration; and (4) the observation at one site for short term. Three reanalysis data, i.e., NCEP-R1, ERA-40, and JRA-25, were used. Also the observed meteorological data, which includes many missing data, were obtained from 11 stations of the Vietnamese Meteorological Agency for 28-yr and five stations of the Global Summary of the Day for 30-yr. The observed GSR data for 1-yr was obtained from our station. Considering the use of data with many missing data for analysis, the Bayesian inference was used for this study, which has the powerful capability to optimize multiple parameters in a non-linear and hierarchical model. The Bayesian inference provided the posterior distributions of 306 parameter values relating to the combined equations, the empirical equation, and the correction equation. The preliminary result shows that the amplitude of daily fluctuation of modeled GSR was underestimated by the empirical equation and the correction equation. The combination of SDM and Bayesian inference has a potential to estimate the long- term daily GSR of the same quality even though in the area where the observed data is quite limited.
NASA Astrophysics Data System (ADS)
Kumar, Ranjeet; Chandra, Navin; Tomar, Surekha
2016-02-01
This paper deals with the role of triple encounters with low initial velocities and equal masses in the framework of statistical escape theory in two-dimensional space. This system is described by allowing for both energy and angular momentum conservation in the phase space. The complete statistical solutions (i.e. the semi-major axis `a', the distributions of eccentricity `e', and energy Eb of the final binary, escape energy Es of escaper and its escape velocity vs) of the system are calculated. These are in good agreement with the numerical results of Chandra and Bhatnagar (1999) in the range of perturbing velocities vi (10^{-1} ≤ vi ≤ 10^{-10}) in two-dimensional space. The double limit process has been applied to the system. It is observed that when vi to 0^{ +}, a vs2 to 2 / 3 for all directions in two-dimensional space.
A statistical mechanical theory for a two-dimensional model of water
Urbic, Tomaz; Dill, Ken A.
2010-01-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the “Mercedes-Benz” (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water’s heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water’s large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state. PMID:20550408
Statistics of active and passive scalars in one-dimensional compressible turbulence.
Ni, Qionglin; Chen, Shiyi
2012-12-01
Statistics of the active temperature and passive concentration advected by the one-dimensional stationary compressible turbulence at Re_{λ}=2.56×10^{6} and M_{t}=1.0 is investigated by using direct numerical simulation with all-scale forcing. It is observed that the signal of velocity, as well as the two scalars, is full of small-scale sawtooth structures. The temperature spectrum corresponds to G(k)∝k^{-5/3}, whereas the concentration spectrum acts as a double power law of H(k)∝k^{-5/3} and H(k)∝k^{-7/3}. The probability distribution functions (PDFs) for the two scalar increments show that both δT and δC are strongly intermittent at small separation distance r and gradually approach the Gaussian distribution as r increases. Simultaneously, the exponent values of the PDF tails for the large negative scalar gradients are q_{θ}=-4.0 and q_{ζ}=-3.0, respectively. A single power-law region of finite width is identified in the structure function (SF) of δT; however, in the SF of δC, there are two regions with the exponents taken as a local minimum and a local maximum. As for the scalings of the two SFs, they are close to the Burgers and Obukhov-Corrsin scalings, respectively. Moreover, the negative filtered flux at large scales and the time-increasing total variance give evidences to the existence of an inverse cascade of the passive concentration, which is induced by the implosive collapse in the Lagrangian trajectories. PMID:23368038
Computationally efficient Bayesian inference for inverse problems.
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
NASA Astrophysics Data System (ADS)
Verma, Sanjeet K.; Oliveira, Elson P.
2013-08-01
In present work, we applied two sets of new multi-dimensional geochemical diagrams (Verma et al., 2013) obtained from linear discriminant analysis (LDA) of natural logarithm-transformed ratios of major elements and immobile major and trace elements in acid magmas to decipher plate tectonic settings and corresponding probability estimates for Paleoproterozoic rocks from Amazonian craton, São Francisco craton, São Luís craton, and Borborema province of Brazil. The robustness of LDA minimizes the effects of petrogenetic processes and maximizes the separation among the different tectonic groups. The probability based boundaries further provide a better objective statistical method in comparison to the commonly used subjective method of determining the boundaries by eye judgment. The use of readjusted major element data to 100% on an anhydrous basis from SINCLAS computer program, also helps to minimize the effects of post-emplacement compositional changes and analytical errors on these tectonic discrimination diagrams. Fifteen case studies of acid suites highlighted the application of these diagrams and probability calculations. The first case study on Jamon and Musa granites, Carajás area (Central Amazonian Province, Amazonian craton) shows a collision setting (previously thought anorogenic). A collision setting was clearly inferred for Bom Jardim granite, Xingú area (Central Amazonian Province, Amazonian craton) The third case study on Older São Jorge, Younger São Jorge and Maloquinha granites Tapajós area (Ventuari-Tapajós Province, Amazonian craton) indicated a within-plate setting (previously transitional between volcanic arc and within-plate). We also recognized a within-plate setting for the next three case studies on Aripuanã and Teles Pires granites (SW Amazonian craton), and Pitinga area granites (Mapuera Suite, NW Amazonian craton), which were all previously suggested to have been emplaced in post-collision to within-plate settings. The seventh case
Saberi, A A; Rouhani, S
2009-03-01
We investigate the statistics of isoheight lines of (2+1) -dimensional Kardar-Parisi-Zhang model at different level sets around the mean height in the saturation regime. We find that the exponent describing the distribution of the height-cluster size behaves differently for level cuts above and below the mean height, while the fractal dimensions of the height-clusters and their perimeters remain unchanged. The statistics of the winding angle confirms the previous observation that these contour lines are in the same universality class as self-avoiding random walks. PMID:19392013
Statistical Entropy of Four-Dimensional Rotating Black Holes from Near-Horizon Geometry
Cvetic, M.; Larsen, F.; Cvetic, M.
1999-01-01
We show that a class of four-dimensional rotating black holes allow five-dimensional embeddings as black rotating strings. Their near-horizon geometry factorizes locally as a product of the three-dimensional anti{endash}de Sitter space-time and a two-dimensional sphere (AdS{sub 3}{times}S{sup 2} ), with angular momentum encoded in the global space-time structure. Following the observation that the isometries on the AdS{sub 3} space induce a two-dimensional (super)conformal field theory on the boundary, we reproduce the microscopic entropy with the correct dependence on the black hole angular momentum. {copyright} {ital 1999} {ital The American Physical Society }
On the criticality of inferred models
NASA Astrophysics Data System (ADS)
Mastromatteo, Iacopo; Marsili, Matteo
2011-10-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.
NASA Technical Reports Server (NTRS)
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
NASA Astrophysics Data System (ADS)
Hunziker, Jürg; Laloy, Eric; Linde, Niklas
2016-04-01
Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present
Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A
2011-10-01
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions. PMID:21622076
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Hu, Jun; Li, Zhi-Wei; Ding, Xiao-Li; Zhu, Jian-Jun
2008-01-01
The Mw=7.6 Chi-Chi earthquake in Taiwan occurred in 1999 over the Chelungpu fault and caused a great surface rupture and severe damage. Differential Synthetic Aperture Radar Interferometry (DInSAR) has been applied previously to study the co-seismic ground displacements. There have however been significant limitations in the studies. First, only one-dimensional displacements along the Line-of-Sight (LOS) direction have been measured. The large horizontal displacements along the Chelungpu fault are largely missing from the measurements as the fault is nearly perpendicular to the LOS direction. Second, due to severe signal decorrelation on the hangling wall of the fault, the displacements in that area are un-measurable by differential InSAR method. We estimate the co-seismic displacements in both the azimuth and range directions with the method of SAR amplitude image matching. GPS observations at the 10 GPS stations are used to correct for the orbital ramp in the amplitude matching and to create the two-dimensional (2D) co-seismic surface displacements field using the descending ERS-2 SAR image pair. The results show that the co-seismic displacements range from about -2.0 m to 0.7 m in the azimuth direction (with the positive direction pointing to the flight direction), with the footwall side of the fault moving mainly southwards and the hanging wall side northwards. The displacements in the LOS direction range from about -0.5 m to 1.0 m, with the largest displacement occuring in the northeastern part of the hanging wall (the positive direction points to the satellite from ground). Comparing the results from amplitude matching with those from DInSAR, we can see that while only a very small fraction of the LOS displacement has been recovered by the DInSAR mehtod, the azimuth displacements cannot be well detected with the DInSAR measurements as they are almost perpendicular to the LOS. Therefore, the amplitude matching method is obviously more advantageous than the DIn
NASA Astrophysics Data System (ADS)
King, Gary; Rosen, Ori; Tanner, Martin A.
2004-09-01
This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.
Melville, C A; Johnson, P C D; Smiley, E; Simpson, N; McConnachie, A; Purves, D; Osugo, M; Cooper, S-A
2016-01-01
Diagnosing mental ill-health using categorical classification systems has limited validity for clinical practice and research. Dimensions of psychopathology have greater validity than categorical diagnoses in the general population, but dimensional models have not had a significant impact on our understanding of mental ill-health and problem behaviours experienced by adults with intellectual disabilities. This paper systematically reviews the methods and findings from intellectual disabilities studies that use statistical methods to identify dimensions of psychopathology from data collected using structured assessments of psychopathology. The PRISMA framework for systematic review was used to identify studies for inclusion. Study methods were compared to best-practice guidelines on the use of exploratory factor analysis. Data from the 20 studies included suggest that it is possible to use statistical methods to model dimensions of psychopathology experienced by adults with intellectual disabilities. However, none of the studies used methods recommended for the analysis of non-continuous psychopathology data and all 20 studies used statistical methods that produce unstable results that lack reliability. Statistical modelling is a promising methodology to improve our understanding of mental ill-health experienced by adults with intellectual disabilities but future studies should use robust statistical methods to build on the existing evidence base. PMID:26852278
Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics
ERIC Educational Resources Information Center
Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.
2007-01-01
We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…
Statistics of Critical Points of Gaussian Fields on Large-Dimensional Spaces
Bray, Alan J.; Dean, David S.
2007-04-13
We calculate the average number of critical points of a Gaussian field on a high-dimensional space as a function of their energy and their index. Our results give a complete picture of the organization of critical points and are of relevance to glassy and disordered systems and landscape scenarios coming from the anthropic approach to string theory.
Spatial mapping and statistical reproducibility of an array of 256 one-dimensional quantum wires
Al-Taie, H. Kelly, M. J.; Smith, L. W.; Lesage, A. A. J.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Smith, C. G.; See, P.
2015-08-21
We utilize a multiplexing architecture to measure the conductance properties of an array of 256 split gates. We investigate the reproducibility of the pinch off and one-dimensional definition voltage as a function of spatial location on two different cooldowns, and after illuminating the device. The reproducibility of both these properties on the two cooldowns is high, the result of the density of the two-dimensional electron gas returning to a similar state after thermal cycling. The spatial variation of the pinch-off voltage reduces after illumination; however, the variation of the one-dimensional definition voltage increases due to an anomalous feature in the center of the array. A technique which quantifies the homogeneity of split-gate properties across the array is developed which captures the experimentally observed trends. In addition, the one-dimensional definition voltage is used to probe the density of the wafer at each split gate in the array on a micron scale using a capacitive model.
Statistical Analysis of Current Sheets in Three-dimensional Magnetohydrodynamic Turbulence
NASA Astrophysics Data System (ADS)
Zhdankin, Vladimir; Uzdensky, Dmitri A.; Perez, Jean C.; Boldyrev, Stanislav
2013-07-01
We develop a framework for studying the statistical properties of current sheets in numerical simulations of magnetohydrodynamic (MHD) turbulence with a strong guide field, as modeled by reduced MHD. We describe an algorithm that identifies current sheets in a simulation snapshot and then determines their geometrical properties (including length, width, and thickness) and intensities (peak current density and total energy dissipation rate). We then apply this procedure to simulations of reduced MHD and perform a statistical analysis on the obtained population of current sheets. We evaluate the role of reconnection by separately studying the populations of current sheets which contain magnetic X-points and those which do not. We find that the statistical properties of the two populations are different in general. We compare the scaling of these properties to phenomenological predictions obtained for the inertial range of MHD turbulence. Finally, we test whether the reconnecting current sheets are consistent with the Sweet-Parker model.
NASA Technical Reports Server (NTRS)
Balkanski, Yves J.; Jacob, Daniel J.; Gardner, Geraldine M.; Graustein, William C.; Turekian, Karl K.
1993-01-01
A global three-dimensional model is used to investigate the transport and tropospheric residence time of Pb-210, an aerosol tracer produced in the atmosphere by radioactive decay of Rn-222 emitted from soils. The model uses meteorological input with 4 deg x 5 deg horizontal resolution and 4-hour temporal resolution from the Goddard Institute for Space Studies general circulation model (GCM). It computes aerosol scavenging by convective precipitation as part of the wet convective mass transport operator in order to capture the coupling between vertical transport and rainout. Scavenging in convective precipitation accounts for 74% of the global Pb-210 sink in the model; scavenging in large-scale precipitation accounts for 12%, and scavenging in dry deposition accounts for 14%. The model captures 63% of the variance of yearly mean Pb-210 concentrations measured at 85 sites around the world with negligible mean bias, lending support to the computation of aerosol scavenging. There are, however, a number of regional and seasonal discrepancies that reflect in part anomalies in GCM precipitation. Computed residence times with respect to deposition for Pb-210 aerosol in the tropospheric column are about 5 days at southern midlatitudes and 10-15 days in the tropics; values at northern midlatitudes vary from about 5 days in winter to 10 days in summer. The residence time of Pb-210 produced in the lowest 0.5 km of atmosphere is on average four times shorter than that of Pb-210 produced in the upper atmosphere. Both model and observations indicate a weaker decrease of Pb-210 concentrations between the continental mixed layer and the free troposphere than is observed for total aerosol concentrations; an explanation is that Rn-222 is transported to high altitudes in wet convective updrafts, while aerosols and soluble precursors of aerosols are scavenged by precipitation in the updrafts. Thus Pb-210 is not simply a tracer of aerosols produced in the continental boundary layer, but
NASA Astrophysics Data System (ADS)
He, J.; Xiao, J.; Pan, Z.
2014-12-01
Associated with northward convergence of the India continent, the surface motion of the Tibetan plateau, documented mainly by dense geodetic GPS measurements, changes greatly both on magnitude and on direction in different tectonic units. The most remarkable discordance of surface motion is around the eastern Himalayan syntaxis, where GPS velocity field is rotated gradually to oppositional direction near the southeastern Tibetan plateau with respect to the northward convergence of the India continent. Such a velocity field could be result from lateral boundary conditions, since the strength of lithosphere is probably weaker in the Tibetan plateau than in the surrounding regions. However, whether the surface motion of the Tibetan plateau is affected by basal shear at base of the elastic crust, that could exist if the coupling condition between the elastic and the viscous crust were changed, is unclear. Here, we developed a large-scale three-dimensional finite element model to explore the possible existence of basal shear below the Tibetan plateau and the surrounding regions. In the model, the lateral boundaries are specified with far-field boundary condition; the blocks surrounding the Tibetan plateaulike the Tarim, the Ordos, and the South China are treat as rigid blocks; and the mean thickness of elastic crust is assumed about 25km. Then, the magnitude and distribution of basal shear stress is automatically searchedin numerical calculation to fit surface (GPS) motion of the Tibetan plateau. We find that to better fit surface motion of the Tibetan plateau, negligible basal shear stress on the base of elastic crust is needed below majority of the western and the central Tibetan plateau; Whereas, around the eastern and the southeastern Tibetan plateau, especially between the Xianshuhestrike-slip fault and the eastern Himalayan syntaxis, at least ~1.5-3.0 Mpaof basal shear stress is needed to cause rotational surface motion as GPS measurements documented. This
NASA Astrophysics Data System (ADS)
Jameson, A. R.; Larsen, M. L.
2016-06-01
Microphysical understanding of the variability in rain requires a statistical characterization of different drop sizes both in time and in all dimensions of space. Temporally, there have been several statistical characterizations of raindrop counts. However, temporal and spatial structures are neither equivalent nor readily translatable. While there are recent reports of the one-dimensional spatial correlation functions in rain, they can only be assumed to represent the two-dimensional (2D) correlation function under the assumption of spatial isotropy. To date, however, there are no actual observations of the (2D) spatial correlation function in rain over areas. Two reasons for this deficiency are the fiscal and the physical impossibilities of assembling a dense network of instruments over even hundreds of meters much less over kilometers. Consequently, all measurements over areas will necessarily be sparsely sampled. A dense network of data must then be estimated using interpolations from the available observations. In this work, a network of 19 optical disdrometers over a 100 m by 71 m area yield observations of drop spectra every minute. These are then interpolated to a 1 m resolution grid. Fourier techniques then yield estimates of the 2D spatial correlation functions. Preliminary examples using this technique found that steadier, light rain decorrelates spatially faster than does the convective rain, but in both cases the 2D spatial correlation functions are anisotropic, reflecting an asymmetry in the physical processes influencing the rain reaching the ground not accounted for in numerical microphysical models.
High-dimensional statistical measure for region-of-interest tracking.
Boltz, Sylvain; Debreuve, Eric; Barlaud, Michel
2009-06-01
This paper deals with region-of-interest (ROI) tracking in video sequences. The goal is to determine in successive frames the region which best matches, in terms of a similarity measure, a ROI defined in a reference frame. Some tracking methods define similarity measures which efficiently combine several visual features into a probability density function (PDF) representation, thus building a discriminative model of the ROI. This approach implies dealing with PDFs with domains of definition of high dimension. To overcome this obstacle, a standard solution is to assume independence between the different features in order to bring out low-dimension marginal laws and/or to make some parametric assumptions on the PDFs at the cost of generality. We discard these assumptions by proposing to compute the Kullback-Leibler divergence between high-dimensional PDFs using the k th nearest neighbor framework. In consequence, the divergence is expressed directly from the samples, i.e., without explicit estimation of the underlying PDFs. As an application, we defined 5, 7, and 13-dimensional feature vectors containing color information (including pixel-based, gradient-based and patch-based) and spatial layout. The proposed procedure performs tracking allowing for translation and scaling of the ROI. Experiments show its efficiency on a movie excerpt and standard test sequences selected for the specific conditions they exhibit: partial occlusions, variations of luminance, noise, and complex motion. PMID:19369157
Allen, J; Velsko, S
2009-11-16
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the
Three-dimensional segmentation of the heart muscle using image statistics
NASA Astrophysics Data System (ADS)
Nillesen, Maartje M.; Lopata, Richard G. P.; Gerrits, Inge H.; Kapusta, Livia; Huisman, Henkjan H.; Thijssen, Johan M.; de Korte, Chris L.
2006-03-01
Segmentation of the heart muscle in 3D echocardiographic images provides a tool for visualization of cardiac anatomy and assessment of heart function, and serves as an important pre-processing step for cardiac strain imaging. By incorporating spatial and temporal information of 3D ultrasound image sequences (4D), a fully automated method using image statistics was developed to perform 3D segmentation of the heart muscle. 3D rf-data were acquired with a Philips SONOS 7500 live 3D ultrasound system, and an X4 matrix array transducer (2-4 MHz). Left ventricular images of five healthy children were taken in transthoracial short/long axis view. As a first step, image statistics of blood and heart muscle were investigated. Next, based on these statistics, an adaptive mean squares filter was selected and applied to the images. Window size was related to speckle size (5x2 speckles). The degree of adaptive filtering was automatically steered by the local homogeneity of tissue. As a result, discrimination of heart muscle and blood was optimized, while sharpness of edges was preserved. After this pre-processing stage, homomorphic filtering and automatic thresholding were performed to obtain the inner borders of the heart muscle. Finally, a deformable contour algorithm was used to yield a closed contour of the left ventricular cavity in each elevational plane. Each contour was optimized using contours of the surrounding planes (spatial and temporal) as limiting condition to ensure spatial and temporal continuity. Better segmentation of the ventricle was obtained using 4D information than using information of each plane separately.
Emergent exclusion statistics of quasiparticles in two-dimensional topological phases
NASA Astrophysics Data System (ADS)
Hu, Yuting; Stirling, Spencer D.; Wu, Yong-Shi
2014-03-01
We demonstrate how the generalized Pauli exclusion principle emerges for quasiparticle excitations in 2D topological phases. As an example, we examine the Levin-Wen model with the Fibonacci data (specified in the text), and construct the number operator for fluxons living on plaquettes. By numerically counting the many-body states with fluxon number fixed, the matrix of exclusion statistics parameters is identified and is shown to depend on the spatial topology (sphere or torus) of the system. Our work reveals the structure of the (many-body) Hilbert space and some general features of thermodynamics for quasiparticle excitations in topological matter.
Statistical properties of three-dimensional two-fluid plasma model
Qaisrani, M. Hasnain; Xia, ZhenWei; Zou, Dandan
2015-09-15
The nonlinear dynamics of incompressible non-dissipative two-fluid plasma model is investigated through classical Gibbs ensemble methods. Liouville's theorem of phase space for each wave number is proved, and the absolute equilibrium spectra for Galerkin truncated two-fluid model are calculated. In two-fluid theory, the equilibrium is built on the conservation of three quadratic invariants: the total energy and the self-helicities for ions and electrons fluid, respectively. The implications of statistic equilibrium spectra with arbitrary ratios of conserved invariants are discussed.
NASA Astrophysics Data System (ADS)
Nguyen van yen, Romain; Farge, Marie; Schneider, Kai
2012-02-01
Classical statistical theories of turbulence have shown their limitations, in that they cannot predict much more than the energy spectrum in an idealized setting of statistical homogeneity and stationarity. We explore the applicability of a conditional statistical modeling approach: can we sort out what part of the information should be kept, and what part should be modeled statistically, or, in other words, “dissipated”? Our mathematical framework is the initial value problem for the two-dimensional (2D) Euler equations, which we approximate numerically by solving the 2D Navier-Stokes equations in the vanishing viscosity limit. In order to obtain a good approximation of the inviscid dynamics, we use a spectral method and a resolution going up to 8192 2. We introduce a macroscopic concept of dissipation, relying on a split of the flow between coherent and incoherent contributions: the coherent flow is constructed from the large wavelet coefficients of the vorticity field, and the incoherent flow from the small ones. In previous work, a unique threshold was applied to all wavelet coefficients, while here we also consider the effect of a scale by scale thresholding algorithm, called scale-wise coherent vorticity extraction. We study the statistical properties of the coherent and incoherent vorticity fields, and the transfers of enstrophy between them, and then use these results to propose, within a maximum entropy framework, a simple model for the incoherent vorticity. In the framework of this model, we show that the flow velocity can be predicted accurately in the L2 norm for about 10 eddy turnover times.
Vorticity statistics in the direct cascade of two-dimensional turbulence.
Falkovich, Gregory; Lebedev, Vladimir
2011-04-01
For the direct cascade of steady two-dimensional (2D) Navier-Stokes turbulence, we derive analytically the probability of strong vorticity fluctuations. When ϖ is the vorticity coarse-grained over a scale R, the probability density function (PDF), P(ϖ), has a universal asymptotic behavior lnP~-ϖ/ϖ(rms) at ϖ≫ϖ(rms)=[Hln(L/R)](1/3), where H is the enstrophy flux and L is the pumping length. Therefore, the PDF has exponential tails and is self-similar, that is, it can be presented as a function of a single argument, ϖ/ϖ(rms), in distinction from other known direct cascades. PMID:21599229
NASA Astrophysics Data System (ADS)
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V.
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X1(t ) is the position of the rightmost particle of the branching Brownian motion at time t , the empirical velocity c of this rightmost particle is defined as c =X1(t ) /t . Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P (c ,t ) of this empirical velocity c in the long-time t limit for c >2 . It is already known that, for a single seed particle, P (c ,t ) ˜exp[-(c2/4 -1 ) t ] up to a prefactor that can depend on c and t . Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations.
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X_{1}(t) is the position of the rightmost particle of the branching Brownian motion at time t, the empirical velocity c of this rightmost particle is defined as c=X_{1}(t)/t. Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P(c,t) of this empirical velocity c in the long-time t limit for c>2. It is already known that, for a single seed particle, P(c,t)∼exp[-(c^{2}/4-1)t] up to a prefactor that can depend on c and t. Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations. PMID:27176286
Universal asymptotic statistics of maximal relative height in one-dimensional solid-on-solid models
NASA Astrophysics Data System (ADS)
Schehr, Grégory; Majumdar, Satya N.
2006-05-01
We study the probability density function P(hm,L) of the maximum relative height hm in a wide class of one-dimensional solid-on-solid models of finite size L . For all these lattice models, in the large- L limit, a central limit argument shows that, for periodic boundary conditions, P(hm,L) takes a universal scaling form P(hm,L)˜(12wL)-1f(hm/(12wL)) , with wL the width of the fluctuating interface and f(x) the Airy distribution function. For one instance of these models, corresponding to the extremely anisotropic Ising model in two dimensions, this result is obtained by an exact computation using the transfer matrix technique, valid for any L>0 . These arguments and exact analytical calculations are supported by numerical simulations, which show in addition that the subleading scaling function is also universal, up to a nonuniversal amplitude, and simply given by the derivative of the Airy distribution function f'(x) .
NASA Astrophysics Data System (ADS)
Caticha, Ariel
2011-03-01
In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.
Malinowski, Kathleen T.; Pantarotto, Jason R.; Senan, Suresh
2010-08-01
Purpose: To investigate the feasibility of modeling Stage III lung cancer tumor and node positions from anatomical surrogates. Methods and Materials: To localize their centroids, the primary tumor and lymph nodes from 16 Stage III lung cancer patients were contoured in 10 equal-phase planning four-dimensional (4D) computed tomography (CT) image sets. The centroids of anatomical respiratory surrogates (carina, xyphoid, nipples, mid-sternum) in each image set were also localized. The correlations between target and surrogate positions were determined, and ordinary least-squares (OLS) and partial least-squares (PLS) regression models based on a subset of respiratory phases (three to eight randomly selected) were created to predict the target positions in the remaining images. The three-phase image sets that provided the best predictive information were used to create models based on either the carina alone or all surrogates. Results: The surrogate most correlated with target motion varied widely. Depending on the number of phases used to build the models, mean OLS and PLS errors were 1.0 to 1.4 mm and 0.8 to 1.0 mm, respectively. Models trained on the 0%, 40%, and 80% respiration phases had mean ({+-} standard deviation) PLS errors of 0.8 {+-} 0.5 mm and 1.1 {+-} 1.1 mm for models based on all surrogates and carina alone, respectively. For target coordinates with motion >5 mm, the mean three-phase PLS error based on all surrogates was 1.1 mm. Conclusions: Our results establish the feasibility of inferring primary tumor and nodal motion from anatomical surrogates in 4D CT scans of Stage III lung cancer. Using inferential modeling to decrease the processing time of 4D CT scans may facilitate incorporation of patient-specific treatment margins.
Inverse Ising inference with correlated samples
NASA Astrophysics Data System (ADS)
Obermayer, Benedikt; Levine, Erel
2014-12-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem.
NASA Astrophysics Data System (ADS)
Nair, Anish Kumar M.; Rajeev, Kunjukrishnapillai
2012-07-01
Long-term (2006-2011) monthly and seasonal mean vertical distributions of clouds and their spatial variations over the Indian subcontinent and surrounding oceanic regions have been derived using data obtained from the space-borne radar, CloudSat. Together with the data from space-borne imagers (Kalpana-1-VHRR and NOAA-AVHRR), this provide insight into the 3-dimensional distribution of clouds and its linkage with dominant tropical dynamical features, which are largely unexplored over the Indian region. Meridonal cross sections of ITCZ, inferred from the vertical distribution of clouds, clearly reveal the relatively narrow structure of ITCZ flanked by thick cirrus outflows in the upper troposphere on either side. The base of cirrus clouds in the outflow region significantly increases away from the ITCZ core, while the corresponding variations in cirrus top is negligible, resulting in considerable thinning of cirrus away from the ITCZ. This provides direct observational evidence for the infrared radiative heating at cloud base and its role in regulating the cirrus lifetime through sublimation. On average, the frequency of occurrence of clouds rapidly decreases with altitude in the altitude band of 12-14 km, which corresponds to the convective tropopause altitude. North-south inclination and east-west asymmetry of ITCZ during the winter season are distinctly clear in the vertical distribution of clouds, which provide information on the pathways for inter-hemispheric transport over the Indian Ocean during this season. During the Asian summer monsoon season (June-September), substantial amount of deep convective clouds are found to occur over the North Bay of Bengal, extending up to an altitude of >14 km, which is ~1-2 km higher than that over other deep convective regions. This has potential implications in the pumping of tropospheric airmass across the tropical tropopause over the region. This study characterizes a pool of inhibited cloudiness over the southwest Bay of
NASA Astrophysics Data System (ADS)
Mackay, R. M.; Khalil, M. A. K.
1995-10-01
The zonally averaged response of the Global Change Research Center two-dimensional (2-D) statistical dynamical climate model (GCRC 2-D SDCM) to a doubling of atmospheric carbon dioxide (350 parts per million by volume (ppmv) to 700 ppmv) is reported. The model solves the two-dimensional primitive equations in finite difference form (mass continuity, Newton's second law, and the first law of thermodynamics) for the prognostic variables: zonal mean density, zonal mean zonal velocity, zonal mean meridional velocity, and zonal mean temperature on a grid that has 18 nodes in latitude and 9 vertical nodes (plus the surface). The equation of state, p=ρRT, and an assumed hydrostatic atmosphere, Delta;p=-ρgΔz, are used to diagnostically calculate the zonal mean pressure and vertical velocity for each grid node, and the moisture balance equation is used to estimate the precipitation rate. The model includes seasonal variations in solar intensity, including the effects of eccentricity, and has observed land and ocean fractions set for each zone. Seasonally varying values of cloud amounts, relative humidity profiles, ozone, and sea ice are all prescribed in the model. Equator to pole ocean heat transport is simulated in the model by turbulent diffusion. The change in global mean annual surface air temperature due to a doubling of atmospheric CO2 in the 2-D model is 1.61 K, which is close to that simulated by the one-dimensional (1-D) radiative convective model (RCM) which is at the heart of the 2-D model radiation code (1.67 K for the moist adiabatic lapse rate assumption in 1-D RCM). We find that the change in temperature structure of the model atmosphere has many of the characteristics common to General Circulation Models, including amplified warming at the poles and the upper tropical troposphere, and stratospheric cooling. Because of the potential importance of atmospheric circulation feedbacks on climate change, we have also investigated the response of the zonal wind
NASA Astrophysics Data System (ADS)
Bartolucci, Daniele; De Marchis, Francesca
2015-08-01
We are motivated by the study of the Microcanonical Variational Principle within Onsager's description of two-dimensional turbulence in the range of energies where the equivalence of statistical ensembles fails. We obtain sufficient conditions for the existence and multiplicity of solutions for the corresponding Mean Field Equation on convex and "thin" enough domains in the supercritical (with respect to the Moser-Trudinger inequality) regime. This is a brand new achievement since existence results in the supercritical region were previously known only on multiply connected domains. We then study the structure of these solutions by the analysis of their linearized problems and we also obtain a new uniqueness result for solutions of the Mean Field Equation on thin domains whose energy is uniformly bounded from above. Finally we evaluate the asymptotic expansion of those solutions with respect to the thinning parameter and, combining it with all the results obtained so far, we solve the Microcanonical Variational Principle in a small range of supercritical energies where the entropy is shown to be concave.
NASA Astrophysics Data System (ADS)
Zhang, Honghai; Walker, Nicholas; Mitchell, Steven C.; Thomas, Matthew; Wahle, Andreas; Scholz, Thomas; Sonka, Milan
2006-03-01
Conventional analysis of cardiac ventricular magnetic resonance images is performed using short axis images and does not guarantee completeness and consistency of the ventricle coverage. In this paper, a four-dimensional (4D, 3D+time) left and right ventricle statistical shape model was generated from the combination of the long axis and short axis images. Iterative mutual intensity registration and interpolation were used to merge the long axis and short axis images into isotropic 4D images and simultaneously correct existing breathing artifact. Distance-based shape interpolation and approximation were used to generate complete ventricle shapes from the long axis and short axis manual segmentations. Landmarks were automatically generated and propagated to 4D data samples using rigid alignment, distance-based merging, and B-spline transform. Principal component analysis (PCA) was used in model creation and analysis. The two strongest modes of the shape model captured the most important shape feature of Tetralogy of Fallot (TOF) patients, right ventricle enlargement. Classification of cardiac images into classes of normal and TOF subjects performed on 3D and 4D models showed 100% classification correctness rates for both normal and TOF subjects using k-Nearest Neighbor (k=1 or 3) classifier and the two strongest shape modes.
Porch, Clay E; Lauretta, Matthew V
2016-01-01
Forecasts of the future abundance of western Atlantic bluefin tuna (Thunnus thynnus) have, for nearly two decades, been based on two competing views of future recruitment potential: (1) a "low" recruitment scenario based on hockey-stick (two-line) curve where the expected level of recruitment is set equal to the geometric mean of the recruitment estimates for the years after a supposed regime-shift in 1975, and (2) a "high" recruitment scenario based on a Beverton-Holt curve fit to the time series of spawner-recruit pairs beginning in 1970. Several investigators inferred the relative plausibility of these two scenarios based on measures of their ability to fit estimates of spawning biomass and recruitment derived from stock assessment outputs. Typically, these comparisons have assumed the assessment estimates of spawning biomass are known without error. It is shown here that ignoring error in the spawning biomass estimates can predispose model-choice approaches to favor the regime-shift hypothesis over the Beverton-Holt curve with higher recruitment potential. When the variance of the observation error approaches that which is typically estimated for assessment outputs, the same model-choice approaches tend to favor the single Beverton-Holt curve. For this and other reasons, it is argued that standard model-choice approaches are insufficient to make the case for a regime shift in the recruitment dynamics of western Atlantic bluefin tuna. A more fruitful course of action may be to move away from the current high/low recruitment dichotomy and focus instead on adopting biological reference points and management procedures that are robust to these and other sources of uncertainty. PMID:27272215
Porch, Clay E.; Lauretta, Matthew V.
2016-01-01
Forecasts of the future abundance of western Atlantic bluefin tuna (Thunnus thynnus) have, for nearly two decades, been based on two competing views of future recruitment potential: (1) a “low” recruitment scenario based on hockey-stick (two-line) curve where the expected level of recruitment is set equal to the geometric mean of the recruitment estimates for the years after a supposed regime-shift in 1975, and (2) a “high” recruitment scenario based on a Beverton-Holt curve fit to the time series of spawner-recruit pairs beginning in 1970. Several investigators inferred the relative plausibility of these two scenarios based on measures of their ability to fit estimates of spawning biomass and recruitment derived from stock assessment outputs. Typically, these comparisons have assumed the assessment estimates of spawning biomass are known without error. It is shown here that ignoring error in the spawning biomass estimates can predispose model-choice approaches to favor the regime-shift hypothesis over the Beverton-Holt curve with higher recruitment potential. When the variance of the observation error approaches that which is typically estimated for assessment outputs, the same model-choice approaches tend to favor the single Beverton-Holt curve. For this and other reasons, it is argued that standard model-choice approaches are insufficient to make the case for a regime shift in the recruitment dynamics of western Atlantic bluefin tuna. A more fruitful course of action may be to move away from the current high/low recruitment dichotomy and focus instead on adopting biological reference points and management procedures that are robust to these and other sources of uncertainty. PMID:27272215
Shirodkar, P V; Xiao, Y K; Sarkar, A; Dalal, S G; Chivas, A R
2006-02-01
The behaviors of chlorine isotopes in relation to air-sea flux variables have been investigated through multivariate statistical analyses (MSA). The MSA technique provides an approach to reduce the data set and was applied to a set of 7 air-sea flux variables to supplement and describe the variation in chlorine isotopic compositions (delta37Cl) of ocean water. The variation in delta37Cl values of surface ocean water from 51 stations in 4 major world oceans--the Pacific, Atlantic, Indian and the Southern Ocean has been observed from -0.76 to +0.74 per thousand (av. 0.039+/-0.04 per thousand). The observed delta37Cl values show basic homogeneity and indicate that the air-sea fluxes act differently in different oceanic regions and help to maintain the balance between delta37Cl values of the world oceans. The study showed that it is possible to model the behavior of chlorine isotopes to the extent of 38-73% for different geographical regions. The models offered here are purely statistical in nature; however, the relationships uncovered by these models extend our understanding of the constancy in delta37Cl of ocean water in relation to air-sea flux variables. PMID:16214214
van IJsseldijk, E. A.; Valstar, E. R.; Stoel, B. C.; Nelissen, R. G. H. H.; Baka, N.; van’t Klooster, R.
2016-01-01
t Klooster, B. L. Kaptein. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models. Bone Joint Res 2016;320–327. DOI: 10.1302/2046-3758.58.2000626. PMID:27491660
NASA Technical Reports Server (NTRS)
Boardman, J. W.; Pieters, C. M.; Green, R. O.; Clark, R. N.; Sunshine, J.; Combe, J.-P.; Isaacson, P.; Lundeen, S. R.; Malaret, E.; McCord, T.; Nettles, J.; Petro, N. E.; Varanasi, P.; Taylor, L.
2010-01-01
The Moon Mineralogy Mapper (M3), a NASA Discovery Mission of Opportunity, was launched October 22, 2008 from Shriharikota in India on board the Indian ISRO Chandrayaan- 1 spacecraft for a nominal two-year mission in a 100-km polar lunar orbit. M3 is a high-fidelity imaging spectrometer with 260 spectral bands in Target Mode and 85 spectral bands in a reduced-resolution Global Mode. Target Mode pixel sizes are nominally 70 meters and Global pixels (binned 2 by 2) are 140 meters, from the planned 100-km orbit. The mission was cut short, just before halfway, in August, 2009 when the spacecraft ceased operations. Despite the abbreviated mission and numerous technical and scientific challenges during the flight, M3 was able to cover more than 95% of the Moon in Global Mode. These data, presented and analyzed here as a global whole, are revolutionizing our understanding of the Moon. Already, numerous discoveries relating to volatiles and unexpected mineralogy have been published [1], [2], [3]. The rich spectral and spatial information content of the M3 data indicates that many more discoveries and an improved understanding of the mineralogy, geology, photometry, thermal regime and volatile status of our nearest neighbor are forthcoming from these data. Sadly, only minimal high-resolution Target Mode images were acquired, as these were to be the focus of the second half of the mission. This abstract gives the reader a global overview of all the M3 data that were collected and an introduction to their rich spectral character and complexity. We employ a Principal Components statistical method to assess the underlying dimensionality of the Moon as a whole, as seen by M3, and to identify numerous areas that are low-probability targets and thus of potential interest to selenologists.
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Chang Liyun; Ho, S.-Y.; Chui, C.-S.; Lee, J.-H.; Du Yichun; Chen Tainsong
2008-06-15
We propose a new method based on statistical analysis technique to determine the minimum setup distance of a well chamber used in the calibration of {sup 192}Ir high dose rate (HDR). The chamber should be placed at least this distance away from any wall or from the floor in order to mitigate the effect of scatter. Three different chambers were included in this study, namely, Sun Nuclear Corporation, Nucletron, and Standard Imaging. The results from this study indicated that the minimum setup distance varies depending on the particular chamber and the room architecture in which the chamber was used. Our result differs from that of a previous study by Podgorsak et al. [Med. Phys. 19, 1311-1314 (1992)], in which 25 cm was suggested, and also differs from that of the International Atomic Energy Agency (IAEA)-TECDOC-1079 report, which suggested 30 cm. The new method proposed in this study may be considered as an alternative approach to determine the minimum setup distance of a well-type chamber used in the calibration of {sup 192}Ir HDR.
Osnes, J.D. ); Winberg, A.; Andersson, J.E.; Larsson, N.A. )
1991-09-27
Statistical and probabilistic methods for estimating the probability that a fracture is nonconductive (or equivalently, the conductive-fracture frequency) and the distribution of the transmissivities of conductive fractures from transmissivity measurements made in single-hole injection (well) tests were developed. These methods were applied to a database consisting of over 1,000 measurements made in nearly 25 km of borehole at five sites in Sweden. The depths of the measurements ranged from near the surface to over 600-m deep, and packer spacings of 20- and 25-m were used. A probabilistic model that describes the distribution of a series of transmissivity measurements was derived. When the parameters of this model were estimated using maximum likelihood estimators, the resulting estimated distributions generally fit the cumulative histograms of the transmissivity measurements very well. Further, estimates of the mean transmissivity of conductive fractures based on the maximum likelihood estimates of the model's parameters were reasonable, both in magnitude and in trend, with respect to depth. The estimates of the conductive fracture probability were generated in the range of 0.5--5.0 percent, with the higher values at shallow depths and with increasingly smaller values as depth increased. An estimation procedure based on the probabilistic model and the maximum likelihood estimators of its parameters was recommended. Some guidelines regarding the design of injection test programs were drawn from the recommended estimation procedure and the parameter estimates based on the Swedish data. 24 refs., 12 figs., 14 tabs.
NASA Technical Reports Server (NTRS)
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
NASA Astrophysics Data System (ADS)
Johnson, Ria; Ponman, Trevor J.; Finoguenov, Alexis
2009-05-01
We have performed a statistical analysis of a sample of 28 nearby galaxy groups derived primarily from the Two-Dimensional XMM-Newton Group Survey, in order to ascertain what factors drive the observed differences in group properties. We specifically focus on entropy and the role of feedback, and divide the sample into cool core (CC) and non-cool core (NCC) systems. This is the first time the latter have been studied in detail in the group regime. We find the coolest groups to have steeper entropy profiles than the warmest systems, and find NCC groups to have higher central entropy and to exhibit more scatter than their CC counterparts. We investigate the entropy distribution of the gas in each system, and compare this to the expected theoretical distribution under the condition that non-gravitational processes are ignored. In all cases, the observed maximum entropy far exceeds that expected theoretically, and simple models for modifications of the theoretical entropy distribution perform poorly. A model which applies initial pre-heating through an entropy shift to match the high entropy behaviour of the observed profile, followed by radiative cooling, generally fails to match the low entropy behaviour, and only performs well when the difference between the maximum entropy of the observed and theoretical distributions is small. Successful feedback models need to work differentially to increase the entropy range in the gas, and we suggest two basic possibilities. We analyse the effects of feedback on the entropy distribution, finding systems with a high measure of `feedback impact' to typically reach higher entropy than their low feedback counterparts. The abundance profiles of high and low feedback systems are comparable over the majority of the radial range, but the high feedback systems show significantly lower central metallicities compared to the low feedback systems. If low entropy, metal-rich gas has been boosted to large entropy in the high feedback systems
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
NASA Astrophysics Data System (ADS)
von Toussaint, Udo
2011-07-01
Bayesian inference provides a consistent method for the extraction of information from physics experiments even in ill-conditioned circumstances. The approach provides a unified rationale for data analysis, which both justifies many of the commonly used analysis procedures and reveals some of the implicit underlying assumptions. This review summarizes the general ideas of the Bayesian probability theory with emphasis on the application to the evaluation of experimental data. As case studies for Bayesian parameter estimation techniques examples ranging from extra-solar planet detection to the deconvolution of the apparatus functions for improving the energy resolution and change point estimation in time series are discussed. Special attention is paid to the numerical techniques suited for Bayesian analysis, with a focus on recent developments of Markov chain Monte Carlo algorithms for high-dimensional integration problems. Bayesian model comparison, the quantitative ranking of models for the explanation of a given data set, is illustrated with examples collected from cosmology, mass spectroscopy, and surface physics, covering problems such as background subtraction and automated outlier detection. Additionally the Bayesian inference techniques for the design and optimization of future experiments are introduced. Experiments, instead of being merely passive recording devices, can now be designed to adapt to measured data and to change the measurement strategy on the fly to maximize the information of an experiment. The applied key concepts and necessary numerical tools which provide the means of designing such inference chains and the crucial aspects of data fusion are summarized and some of the expected implications are highlighted.
Nonparametric inference of network structure and dynamics
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
NASA Astrophysics Data System (ADS)
Baas, Jaco H.
2000-03-01
EZ-ROSE 1.0 is a computer program for the statistical analysis of populations of two-dimensional vectorial data and their presentation in equal-area rose diagrams. The program is implemented as a Microsoft® Excel workbook containing worksheets for the input of directional (circular) or lineational (semi-circular) data and their automatic processing, which includes the calculation of a frequency distribution for a selected class width, statistical analysis, and the construction of a rose diagram in CorelDraw™. The statistical analysis involves tests of uniformity for the vectorial population distribution, such as the nonparametric Kuiper and Watson tests and the parametric Rayleigh test. The statistics calculated include the vector mean, its magnitude (length) and strength (data concentration); the Batschelet circular standard deviation as an alternative measure of vectorial concentration; and a confidence sector for the vector mean. The statistics together with the frequency data are used to prepare a Corel Script™ file that contains all the necessary instructions to draw automatically an equal-area circular frequency histogram (rose diagram) in CorelDraw™. The advantages of EZ-ROSE, compared to other software for circular statistics, are: (1) the ability to use an equal-area scale in rose diagrams; (2) the wide range of tools for a comprehensive statistical analysis; (3) the ease of use, as Microsoft® Excel and CorelDraw™ are widely known to users of Microsoft® Windows; and (4) the high degree of flexibility due to the application of Microsoft® Excel and CorelDraw™, which offer a whole range of tools for possible addition of other statistical methods and changes of the rose-diagram layout.
Statistical inference of static analysis rules
NASA Technical Reports Server (NTRS)
Engler, Dawson Richards (Inventor)
2009-01-01
Various apparatus and methods are disclosed for identifying errors in program code. Respective numbers of observances of at least one correctness rule by different code instances that relate to the at least one correctness rule are counted in the program code. Each code instance has an associated counted number of observances of the correctness rule by the code instance. Also counted are respective numbers of violations of the correctness rule by different code instances that relate to the correctness rule. Each code instance has an associated counted number of violations of the correctness rule by the code instance. A respective likelihood of the validity is determined for each code instance as a function of the counted number of observances and counted number of violations. The likelihood of validity indicates a relative likelihood that a related code instance is required to observe the correctness rule. The violations may be output in order of the likelihood of validity of a violated correctness rule.
Correlation techniques and measurements of wave-height statistics
NASA Technical Reports Server (NTRS)
Guthart, H.; Taylor, W. C.; Graf, K. A.; Douglas, D. G.
1972-01-01
Statistical measurements of wave height fluctuations have been made in a wind wave tank. The power spectral density function of temporal wave height fluctuations evidenced second-harmonic components and an f to the minus 5th power law decay beyond the second harmonic. The observations of second harmonic effects agreed very well with a theoretical prediction. From the wave statistics, surface drift currents were inferred and compared to experimental measurements with satisfactory agreement. Measurements were made of the two dimensional correlation coefficient at 15 deg increments in angle with respect to the wind vector. An estimate of the two-dimensional spatial power spectral density function was also made.
NASA Technical Reports Server (NTRS)
Varnai, Tamas; Marshak, Alexander
2000-01-01
This paper presents a simple approach to estimate the uncertainties that arise in satellite retrievals of cloud optical depth when the retrievals use one-dimensional radiative transfer theory for heterogeneous clouds that have variations in all three dimensions. For the first time, preliminary error bounds are set to estimate the uncertainty of cloud optical depth retrievals. These estimates can help us better understand the nature of uncertainties that three-dimensional effects can introduce into retrievals of this important product of the MODIS instrument. The probability distribution of resulting retrieval errors is examined through theoretical simulations of shortwave cloud reflection for a wide variety of cloud fields. The results are used to illustrate how retrieval uncertainties change with observable and known parameters, such as solar elevation or cloud brightness. Furthermore, the results indicate that a tendency observed in an earlier study, clouds appearing thicker for oblique sun, is indeed caused by three-dimensional radiative effects.
ERIC Educational Resources Information Center
Douglas, Jeff; Kim, Hae-Rim; Roussos, Louis; Stout, William; Zhang, Jinming
An extensive nonparametric dimensionality analysis of latent structure was conducted on three forms of the Law School Admission Test (LSAT) (December 1991, June 1992, and October 1992) using the DIMTEST model in confirmatory analyses and using DIMTEST, FAC, DETECT, HCA, PROX, and a genetic algorithm in exploratory analyses. Results indicate that…
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1989-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z=(z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y (sup 0) = (y (sub 1) (sup 0), ..., y (sub D (sup 0)), using full or partial knowledge of the statistical distribution of the random errors in y (sup 0). The data space Y containing y(sup 0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x, Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. Confidence set inference is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface.
NASA Astrophysics Data System (ADS)
Laloy, Eric; Rogiers, Bart; Vrugt, Jasper; Mallants, Dirk; Jacques, Diederik
2013-04-01
This study presents a novel strategy for accelerating posterior exploration of highly parameterized and CPU-demanding hydrogeologic models. The method builds on the stochastic collocation approach of Marzouk and Xiu (2009) and uses the generalized polynomial chaos (gPC) framework to emulate the output of a groundwater flow model. The resulting surrogate model is CPU-efficient and allows for sampling the posterior parameter distribution at a much reduced computational cost. This surrogate distribution is subsequently employed to precondition a state-of-the-art two-stage Markov chain Monte Carlo (MCMC) simulation (Vrugt et al., 2009; Cui et al., 2011) of the original CPU-demanding flow model. Application of the proposed method to the hydrogeological characterization of a three-dimensional multi-layered aquifer shows a 2-5 times speed up in sampling efficiency.
Schirillo, James A
2013-10-01
In studies of lightness and color constancy, the terms lightness and brightness refer to the qualia corresponding to perceived surface reflectance and perceived luminance, respectively. However, what has rarely been considered is the fact that the volume of space containing surfaces appears neither empty, void, nor black, but filled with light. Helmholtz (1866/1962) came closest to describing this phenomenon when discussing inferred illumination, but previous theoretical treatments have fallen short by restricting their considerations to the surfaces of objects. The present work is among the first to explore how we infer the light present in empty space. It concludes with several research examples supporting the theory that humans can infer the differential levels and chromaticities of illumination in three-dimensional space. PMID:23435628
BIE: Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-12-01
The Bayesian Inference Engine (BIE) is an object-oriented library of tools written in C++ designed explicitly to enable Bayesian update and model comparison for astronomical problems. To facilitate "what if" exploration, BIE provides a command line interface (written with Bison and Flex) to run input scripts. The output of the code is a simulation of the Bayesian posterior distribution from which summary statistics e.g. by taking moments, or determine confidence intervals and so forth, can be determined. All of these quantities are fundamentally integrals and the Markov Chain approach produces variates heta distributed according to P( heta|D) so moments are trivially obtained by summing of the ensemble of variates.
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert
Armour, Cherie
2015-01-01
There has been a substantial body of literature devoted to answering one question: Which latent model of posttraumatic stress disorder (PTSD) best represents PTSD's underlying dimensionality? This research summary will, therefore, focus on the literature pertaining to PTSD's latent structure as represented in the fourth (DSM-IV, 1994) to the fifth (DSM-5, 2013) edition of the DSM. This article will begin by providing a clear rationale as to why this is a pertinent research area, then the body of literature pertaining to the DSM-IV and DSM-IV-TR will be summarised, and this will be followed by a summary of the literature pertaining to the recently published DSM-5. To conclude, there will be a discussion with recommendations for future research directions, namely that researchers must investigate the applicability of the new DSM-5 criteria and the newly created DSM-5 symptom sets to trauma survivors. In addition, researchers must continue to endeavour to identify the “correct” constellations of symptoms within symptom sets to ensure that diagnostic algorithms are appropriate and aid in the development of targeted treatment approaches and interventions. In particular, the newly proposed DSM-5 anhedonia model, externalising behaviours model, and hybrid models must be further investigated. It is also important that researchers follow up on the idea that a more parsimonious latent structure of PTSD may exist. PMID:25994027
NASA Astrophysics Data System (ADS)
Živić, I.; Elezović-Hadžić, S.; Milošević, S.
2014-11-01
We study the adsorption problem of linear polymers, immersed in a good solvent, when the container of the polymer-solvent system is taken to be a member of the Sierpinski gasket (SG) family of fractals, embedded in the three-dimensional Euclidean space. Members of the SG family are enumerated by an integer b (2≤b<∞), and it is assumed that one side of each SG fractal is impenetrable adsorbing boundary. We calculate the surface critical exponents γ11,γ1, and γs which, within the self-avoiding walk model (SAW) of polymer chain, are associated with the numbers of all possible SAWs with both, one, and no ends grafted to the adsorbing surface (adsorbing boundary), respectively. By applying the exact renormalization group method, for 2≤b≤4, we have obtained specific values for these exponents, for various types of polymer conformations. To extend the obtained sequences of exact values for surface critical exponents, we have applied the Monte Carlo renormalization group method for fractals with 2≤b≤40. The obtained results show that all studied exponents are monotonically increasing functions of the parameter b, for all possible polymer states. We discuss mutual relations between the studied critical exponents, and compare their values with those found for other types of lattices, in order to attain a unified picture of the attacked problem.
NASA Technical Reports Server (NTRS)
Iacovazzi, Robert A., Jr.; Prabhakara, C.; Lau, William K. M. (Technical Monitor)
2001-01-01
In this study, a model is developed to estimate mesoscale-resolution atmospheric latent heating (ALH) profiles. It utilizes rain statistics deduced from Tropical Rainfall Measuring Mission (TRMM) data, and cloud vertical velocity profiles and regional surface thermodynamic climatologies derived from other available data sources. From several rain events observed over tropical ocean and land, ALH profiles retrieved by this model in convective rain regions reveal strong warming throughout most of the troposphere, while in stratiform rain regions they usually show slight cooling below the freezing level and significant warming above. The mesoscale-average, or total, ALH profiles reveal a dominant stratiform character, because stratiform rain areas are usually much larger than convective rain areas. Sensitivity tests of the model show that total ALH at a given tropospheric level varies by less than +/- 10 % when convective and stratiform rain rates and mesoscale fractional rain areas are perturbed individually by 1 15 %. This is also found when the non-uniform convective vertical velocity profiles are replaced by one that is uniform. Larger variability of the total ALH profiles arises when climatological ocean- and land-surface temperatures (water vapor mixing ratios) are independently perturbed by +/- 1.0 K (+/- 5 %) and +/- 5.0 K (+/- 15 %), respectively. At a given tropospheric level, such perturbations can cause a +/- 25 % variation of total ALH over ocean, and a factor-of-two sensitivity over land. This sensitivity is reduced substantially if perturbations of surface thermodynamic variables do not change surface relative humidity, or are not extended throughout the entire model evaporation layer. The ALH profiles retrieved in this study agree qualitatively with tropical total diabatic heating profiles deduced in earlier studies. Also, from January and July 1999 ALH-profile climatologies generated separately with TRMM Microwave Imager and Precipitation Radar rain
NASA Astrophysics Data System (ADS)
Manos, Thanos; Robnik, Marko
2015-04-01
We study the quantum kicked rotator in the classically fully chaotic regime K =10 and for various values of the quantum parameter k using Izrailev's N -dimensional model for various N ≤3000 , which in the limit N →∞ tends to the exact quantized kicked rotator. By numerically calculating the eigenfunctions in the basis of the angular momentum we find that the localization length L for fixed parameter values has a certain distribution; in fact, its inverse is Gaussian distributed, in analogy and in connection with the distribution of finite time Lyapunov exponents of Hamilton systems. However, unlike the case of the finite time Lyapunov exponents, this distribution is found to be independent of N and thus survives the limit N =∞ . This is different from the tight-binding model of Anderson localization. The reason is that the finite bandwidth approximation of the underlying Hamilton dynamical system in the Shepelyansky picture [Phys. Rev. Lett. 56, 677 (1986), 10.1103/PhysRevLett.56.677] does not apply rigorously. This observation explains the strong fluctuations in the scaling laws of the kicked rotator, such as the entropy localization measure as a function of the scaling parameter Λ =L /N , where L is the theoretical value of the localization length in the semiclassical approximation. These results call for a more refined theory of the localization length in the quantum kicked rotator and in similar Floquet systems, where we must predict not only the mean value of the inverse of the localization length L but also its (Gaussian) distribution, in particular the variance. In order to complete our studies we numerically analyze the related behavior of finite time Lyapunov exponents in the standard map and of the 2 ×2 transfer matrix formalism. This paper extends our recent work [Phys. Rev. E 87, 062905 (2013), 10.1103/PhysRevE.87.062905].
Methods for Bayesian Power Spectrum Inference with Galaxy Surveys
NASA Astrophysics Data System (ADS)
Jasche, Jens; Wandelt, Benjamin D.
2013-12-01
We derive and implement a full Bayesian large scale structure inference method aiming at precision recovery of the cosmological power spectrum from galaxy redshift surveys. Our approach improves upon previous Bayesian methods by performing a joint inference of the three-dimensional density field, the cosmological power spectrum, luminosity dependent galaxy biases, and corresponding normalizations. We account for all joint and correlated uncertainties between all inferred quantities. Classes of galaxies with different biases are treated as separate subsamples. This method therefore also allows the combined analysis of more than one galaxy survey. In particular, it solves the problem of inferring the power spectrum from galaxy surveys with non-trivial survey geometries by exploring the joint posterior distribution with efficient implementations of multiple block Markov chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high statistical efficiency in low signal-to-noise regimes by using a deterministic reversible jump algorithm. This approach reduces the correlation length of the sampler by several orders of magnitude, turning the otherwise numerically unfeasible problem of joint parameter exploration into a numerically manageable task. We test our method on an artificial mock galaxy survey, emulating characteristic features of the Sloan Digital Sky Survey data release 7, such as its survey geometry and luminosity-dependent biases. These tests demonstrate the numerical feasibility of our large scale Bayesian inference frame work when the parameter space has millions of dimensions. This method reveals and correctly treats the anti-correlation between bias amplitudes and power spectrum, which are not taken into account in current approaches to power spectrum estimation, a 20% effect across large ranges in k space. In addition, this method results in constrained realizations of density fields obtained without assuming the power spectrum or bias parameters
Methods for Bayesian power spectrum inference with galaxy surveys
Jasche, Jens; Wandelt, Benjamin D.
2013-12-10
We derive and implement a full Bayesian large scale structure inference method aiming at precision recovery of the cosmological power spectrum from galaxy redshift surveys. Our approach improves upon previous Bayesian methods by performing a joint inference of the three-dimensional density field, the cosmological power spectrum, luminosity dependent galaxy biases, and corresponding normalizations. We account for all joint and correlated uncertainties between all inferred quantities. Classes of galaxies with different biases are treated as separate subsamples. This method therefore also allows the combined analysis of more than one galaxy survey. In particular, it solves the problem of inferring the power spectrum from galaxy surveys with non-trivial survey geometries by exploring the joint posterior distribution with efficient implementations of multiple block Markov chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high statistical efficiency in low signal-to-noise regimes by using a deterministic reversible jump algorithm. This approach reduces the correlation length of the sampler by several orders of magnitude, turning the otherwise numerically unfeasible problem of joint parameter exploration into a numerically manageable task. We test our method on an artificial mock galaxy survey, emulating characteristic features of the Sloan Digital Sky Survey data release 7, such as its survey geometry and luminosity-dependent biases. These tests demonstrate the numerical feasibility of our large scale Bayesian inference frame work when the parameter space has millions of dimensions. This method reveals and correctly treats the anti-correlation between bias amplitudes and power spectrum, which are not taken into account in current approaches to power spectrum estimation, a 20% effect across large ranges in k space. In addition, this method results in constrained realizations of density fields obtained without assuming the power spectrum or bias parameters
Operation of the Bayes Inference Engine
Hanson, K.M.; Cunningham, G.S.
1998-07-27
The authors have developed a computer application, called the Bayes Inference Engine, to enable one to make inferences about models of a physical object from radiographs taken of it. In the BIE calculational models are represented by a data-flow diagram that can be manipulated by the analyst in a graphical-programming environment. The authors demonstrate the operation of the BIE in terms of examples of two-dimensional tomographic reconstruction including uncertainty estimation.
NASA Astrophysics Data System (ADS)
Swadesh, Joel K.; Poirier, Jacques C.
1981-05-01
demonstrated intriguing common features of functional form in the various partition functions, will be of evaluating approximate statistical mechanical theories, in the theory of solutions, and in the theory of small systems.
ERIC Educational Resources Information Center
Finson, Kevin D.
2010-01-01
Learning about what inferences are, and what a good inference is, will help students become more scientifically literate and better understand the nature of science in inquiry. Students in K-4 should be able to give explanations about what they investigate (NSTA 1997) and that includes doing so through inferring. This article provides some tips…
NASA Astrophysics Data System (ADS)
Yenn Chong, See; Lee, Jung-Ryul; Yik Park, Chan
2013-03-01
Conventional threshold crossing technique generally encounters the difficulty in setting a common threshold level in the extraction of the respective time-of-flights (ToFs) and amplitudes from the guided waves obtained at many different points by spatial scanning. Therefore, we propose a statistical threshold determination method through noise map generation to automatically process numerous guided waves having different propagation distances. First, a two-dimensional (2-D) noise map is generated using one-dimensional (1-D) WT magnitudes at time zero of the acquired waves. Then, the probability density functions (PDFs) of Gamma distribution, Weibull distribution and exponential distribution are used to model the measured 2-D noise map. Graphical goodness-of-fit measurements are used to find the best fit among the three theoretical distributions. Then, the threshold level is automatically determined by selecting the desired confidence level of the noise rejection in the cumulative distribution function of the best fit PDF. Based on this threshold level, the amplitudes and ToFs are extracted and mapped into a 2-D matrix array form. The threshold level determined by the noise statistics may cross the noise signal after time zero. These crossings are represented as salt-and-pepper noise in the ToF and amplitude maps but finally removed by the 1-D median filter. This proposed method was verified in a thick stainless steel hollow cylinder where guided waves were acquired in an area of 180 mm×126 mm of the cylinder by using a laser ultrasonic scanning system and an ultrasonic sensor. The Gamma distribution was estimated as the best fit to the verification experimental data by the proposed algorithm. The statistical parameters of the Gamma distribution were used to determine the threshold level appropriate for most of the guided waves. The ToFs and amplitudes of the first arrival mode were mapped into a 2-D matrix array form. Each map included 447 noisy points out of 90
Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy
2015-10-15
The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii
Cluster Mass Inference via Random Field Theory
Zhang, Hui; Nichols, Thomas E.; Johnson, Timothy D.
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference method available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single-subject and a group fMRI dataset demonstrate better power than traditional cluster extent inference, and good accuracy relative to a gold-standard permutation test. PMID:18805493
Demographic inference under a spatially continuous coalescent model.
Joseph, T A; Hickerson, M J; Alvarado-Serrano, D F
2016-08-01
In contrast with the classical population genetics theory that models population structure as discrete panmictic units connected by migration, many populations exhibit heterogeneous spatial gradients in population connectivity across semi-continuous habitats. The historical dynamics of such spatially structured populations can be captured by a spatially explicit coalescent model recently proposed by Etheridge (2008) and Barton et al. (2010a, 2010b) and whereby allelic lineages are distributed in a two-dimensional spatial continuum and move within this continuum based on extinction and coalescent events. Though theoretically rigorous, this model, which we here refer to as the continuum model, has not yet been implemented for demographic inference. To this end, here we introduce and demonstrate a statistical pipeline that couples the coalescent simulator of Kelleher et al. (2014) that simulates genealogies under the continuum model, with an approximate Bayesian computation (ABC) framework for parameter estimation of neighborhood size (that is, the number of locally breeding individuals) and dispersal ability (that is, the distance an offspring can travel within a generation). Using empirically informed simulations and simulation-based ABC cross-validation, we first show that neighborhood size can be accurately estimated. We then apply our pipeline to the South African endemic shrub species Berkheya cuneata to use the resulting estimates of dispersal ability and neighborhood size to infer the average population density of the species. More generally, we show that spatially explicit coalescent models can be successfully integrated into model-based demographic inference. PMID:27118157
Using scientifically and statistically sufficient statistics in comparing image segmentations.
Chi, Yueh-Yun; Muller, Keith E
2010-01-01
Automatic computer segmentation in three dimensions creates opportunity to reduce the cost of three-dimensional treatment planning of radiotherapy for cancer treatment. Comparisons between human and computer accuracy in segmenting kidneys in CT scans generate distance values far larger in number than the number of CT scans. Such high dimension, low sample size (HDLSS) data present a grand challenge to statisticians: how do we find good estimates and make credible inference? We recommend discovering and using scientifically and statistically sufficient statistics as an additional strategy for overcoming the curse of dimensionality. First, we reduced the three-dimensional array of distances for each image comparison to a histogram to be modeled individually. Second, we used non-parametric kernel density estimation to explore distributional patterns and assess multi-modality. Third, a systematic exploratory search for parametric distributions and truncated variations led to choosing a Gaussian form as approximating the distribution of a cube root transformation of distance. Fourth, representing each histogram by an individually estimated distribution eliminated the HDLSS problem by reducing on average 26,000 distances per histogram to just 2 parameter estimates. In the fifth and final step we used classical statistical methods to demonstrate that the two human observers disagreed significantly less with each other than with the computer segmentation. Nevertheless, the size of all disagreements was clinically unimportant relative to the size of a kidney. The hierarchal modeling approach to object-oriented data created response variables deemed sufficient by both the scientists and statisticians. We believe the same strategy provides a useful addition to the imaging toolkit and will succeed with many other high throughput technologies in genetics, metabolomics and chemical analysis. PMID:24967000
Forward and Backward Inference in Spatial Cognition
Penny, Will D.; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230
Bayesian Inference on Proportional Elections
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software. PMID:25786259
Bayesian inference on proportional elections.
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software. PMID:25786259
Network Plasticity as Bayesian Inference
Legenstein, Robert; Maass, Wolfgang
2015-01-01
General results from statistical learning theory suggest to understand not only brain computations, but also brain plasticity as probabilistic inference. But a model for that has been missing. We propose that inherently stochastic features of synaptic plasticity and spine motility enable cortical networks of neurons to carry out probabilistic inference by sampling from a posterior distribution of network configurations. This model provides a viable alternative to existing models that propose convergence of parameters to maximum likelihood values. It explains how priors on weight distributions and connection probabilities can be merged optimally with learned experience, how cortical networks can generalize learned information so well to novel experiences, and how they can compensate continuously for unforeseen disturbances of the network. The resulting new theory of network plasticity explains from a functional perspective a number of experimental data on stochastic aspects of synaptic plasticity that previously appeared to be quite puzzling. PMID:26545099
Causal inference based on counterfactuals
Höfler, M
2005-01-01
Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept. PMID:16159397
NASA Astrophysics Data System (ADS)
Wolpert, David H.
2008-07-01
We show that physical devices that perform observation, prediction, or recollection share an underlying mathematical structure. We call devices with that structure “inference devices”. We present a set of existence and impossibility results concerning inference devices. These results hold independent of the precise physical laws governing our universe. In a limited sense, the impossibility results establish that Laplace was wrong to claim that even in a classical, non-chaotic universe the future can be unerringly predicted, given sufficient knowledge of the present. Alternatively, these impossibility results can be viewed as a non-quantum-mechanical “uncertainty principle”. The mathematics of inference devices has close connections to the mathematics of Turing Machines (TMs). In particular, the impossibility results for inference devices are similar to the Halting theorem for TMs. Furthermore, one can define an analog of Universal TMs (UTMs) for inference devices. We call those analogs “strong inference devices”. We use strong inference devices to define the “inference complexity” of an inference task, which is the analog of the Kolmogorov complexity of computing a string. A task-independent bound is derived on how much the inference complexity of an inference task can differ for two different inference devices. This is analogous to the “encoding” bound governing how much the Kolmogorov complexity of a string can differ between two UTMs used to compute that string. However no universe can contain more than one strong inference device. So whereas the Kolmogorov complexity of a string is arbitrary up to specification of the UTM, there is no such arbitrariness in the inference complexity of an inference task. We informally discuss the philosophical implications of these results, e.g., for whether the universe “is” a computer. We also derive some graph-theoretic properties governing any set of multiple inference devices. We also present an
Wang, Ting; Ren, Zhao; Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L; Sweet, Robert A; Wang, Jieru; Chen, Wei
2016-02-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM". PMID:26872036
Park, Subok; Jennings, Robert; Liu Haimo; Badano, Aldo; Myers, Kyle
2010-12-15
Purpose: For the last few years, development and optimization of three-dimensional (3D) x-ray breast imaging systems, such as digital breast tomosynthesis (DBT) and computed tomography, have drawn much attention from the medical imaging community, either academia or industry. However, there is still much room for understanding how to best optimize and evaluate the devices over a large space of many different system parameters and geometries. Current evaluation methods, which work well for 2D systems, do not incorporate the depth information from the 3D imaging systems. Therefore, it is critical to develop a statistically sound evaluation method to investigate the usefulness of inclusion of depth and background-variability information into the assessment and optimization of the 3D systems. Methods: In this paper, we present a mathematical framework for a statistical assessment of planar and 3D x-ray breast imaging systems. Our method is based on statistical decision theory, in particular, making use of the ideal linear observer called the Hotelling observer. We also present a physical phantom that consists of spheres of different sizes and materials for producing an ensemble of randomly varying backgrounds to be imaged for a given patient class. Lastly, we demonstrate our evaluation method in comparing laboratory mammography and three-angle DBT systems for signal detection tasks using the phantom's projection data. We compare the variable phantom case to that of a phantom of the same dimensions filled with water, which we call the uniform phantom, based on the performance of the Hotelling observer as a function of signal size and intensity. Results: Detectability trends calculated using the variable and uniform phantom methods are different from each other for both mammography and DBT systems. Conclusions: Our results indicate that measuring the system's detection performance with consideration of background variability may lead to differences in system performance
Kauweloa, Kevin I; Gutierrez, Alonso N; Stathakis, Sotirios; Papanikolaou, Niko; Mavroidis, Panayiotis
2016-07-01
A toolkit has been developed for calculating the 3-dimensional biological effective dose (BED) distributions in multi-phase, external beam radiotherapy treatments such as those applied in liver stereotactic body radiation therapy (SBRT) and in multi-prescription treatments. This toolkit also provides a wide range of statistical results related to dose and BED distributions. MATLAB 2010a, version 7.10 was used to create this GUI toolkit. The input data consist of the dose distribution matrices, organ contour coordinates, and treatment planning parameters from the treatment planning system (TPS). The toolkit has the capability of calculating the multi-phase BED distributions using different formulas (denoted as true and approximate). Following the calculations of the BED distributions, the dose and BED distributions can be viewed in different projections (e.g. coronal, sagittal and transverse). The different elements of this toolkit are presented and the important steps for the execution of its calculations are illustrated. The toolkit is applied on brain, head & neck and prostate cancer patients, who received primary and boost phases in order to demonstrate its capability in calculating BED distributions, as well as measuring the inaccuracy and imprecision of the approximate BED distributions. Finally, the clinical situations in which the use of the present toolkit would have a significant clinical impact are indicated. PMID:27265044
NASA Astrophysics Data System (ADS)
Bocaniov, Serghei A.; Scavia, Donald
2016-06-01
Hypoxia or low bottom water dissolved oxygen (DO) is a world-wide problem of management concern requiring an understanding and ability to monitor and predict its spatial and temporal dynamics. However, this is often made difficult in large lakes and coastal oceans because of limited spatial and temporal coverage of field observations. We used a calibrated and validated three-dimensional ecological model of Lake Erie to extend a statistical relationship between hypoxic extent and bottom water DO concentrations to explore implications of the broader temporal and spatial development and dissipation of hypoxia. We provide the first numerical demonstration that hypoxia initiates in the nearshore, not the deep portion of the basin, and that the threshold used to define hypoxia matters in both spatial and temporal dynamics and in its sensitivity to climate. We show that existing monitoring programs likely underestimate both maximum hypoxic extent and the importance of low oxygen in the nearshore, discuss implications for ecosystem and drinking water protection, and recommend how these results could be used to efficiently and economically extend monitoring programs.
Dynamical inference of hidden biological populations
NASA Astrophysics Data System (ADS)
Luchinsky, D. G.; Smelyanskiy, V. N.; Millonas, M.; McClintock, P. V. E.
2008-10-01
Population fluctuations in a predator-prey system are analyzed for the case where the number of prey could be determined, subject to measurement noise, but the number of predators was unknown. The problem of how to infer the unmeasured predator dynamics, as well as the model parameters, is addressed. Two solutions are suggested. In the first of these, measurement noise and the dynamical noise in the equation for predator population are neglected; the problem is reduced to a one-dimensional case, and a Bayesian dynamical inference algorithm is employed to reconstruct the model parameters. In the second solution a full-scale Markov Chain Monte Carlo simulation is used to infer both the unknown predator trajectory, and also the model parameters, using the one-dimensional solution as an initial guess.
Inference of Internal Stress in a Cell Monolayer.
Nier, Vincent; Jain, Shreyansh; Lim, Chwee Teck; Ishihara, Shuji; Ladoux, Benoit; Marcq, Philippe
2016-04-12
We combine traction force data with Bayesian inversion to obtain an absolute estimate of the internal stress field of a cell monolayer. The method, Bayesian inversion stress microscopy, is validated using numerical simulations performed in a wide range of conditions. It is robust to changes in each ingredient of the underlying statistical model. Importantly, its accuracy does not depend on the rheology of the tissue. We apply Bayesian inversion stress microscopy to experimental traction force data measured in a narrow ring of cohesive epithelial cells, and check that the inferred stress field coincides with that obtained by direct spatial integration of the traction force data in this quasi one-dimensional geometry. PMID:27074687
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Petrov, S.
1996-10-01
Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.
NASA Astrophysics Data System (ADS)
Goodman, Joseph W.
2000-07-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z = (z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y(0)=(y sub 1(0),...,y sub D(0)) knowledge of the statistical distribution of the random errors in y(0). The data space Y containing y(0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x (e.g., energy or dissipation rate), Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. CSI is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface. Neither the heat flow nor the energy bound is strong enough to permit estimation of B(r) at single points on the CMB, but the heat flow bound permits estimation of uniform averages of B(r) over discs on the CMB, and both bounds permit weighted disc-averages with continous weighting kernels. Both bounds also permit estimation of low-degree Gauss coefficients at the CMB. The heat flow bound resolves them up to degree 8 if the crustal field at satellite altitudes must be treated as a systematic error, but can resolve to degree 11 under the most favorable statistical treatment of the crust. These two limits produce circles of confusion on the CMB with diameters of 25 deg and 19 deg respectively.
NASA Technical Reports Server (NTRS)
Yao, Mao-Sung; Stone, Peter H.
1987-01-01
The moist convection parameterization used in the GISS 3-D GCM is adapted for use in a two-dimensional (2-D) zonally averaged statistical-dynamical model. Experiments with different versions of the parameterization show that its impact on the general circulation in the 2-D model does not parallel its impact in the 3-D model unless the effect of zonal variations is parameterized in the moist convection calculations. A parameterization of the variations in moist static energy is introduced in which the temperature variations are calculated from baroclinic stability theory, and the relative humidity is assumed to be constant. Inclusion of the zonal variations of moist static energy in the 2-D moist convection parameterization allows just a fraction of a latitude circle to be unstable and enhances the amount of deep convection. This leads to a 2-D simulation of the general circulation very similar to that in the 3-D model. The experiments show that the general circulation is sensitive to the parameterized amount of deep convection in the subsident branch of the Hadley cell. The more there is, the weaker are the Hadley cell circulations and the westerly jets. The experiments also confirm the effects of momentum mixing associated with moist convection found by earlier investigators and, in addition, show that the momentum mixing weakens the Ferrel cell. An experiment in which the moist convection was removed while the hydrological cycle was retained and the eddy forcing was held fixed shows that moist convection by itself stabilizes the tropics, reduces the Hadley circulation, and reduces the maximum speeds in the westerly jets.
Louarn, Gaëtan; Lecoeur, Jérémie; Lebon, Eric
2008-01-01
Background and Aims In grapevine, canopy-structure-related variations in light interception and distribution affect productivity, yield and the quality of the harvested product. A simple statistical model for reconstructing three-dimensional (3D) canopy structures for various cultivar–training system (C × T) pairs has been implemented with special attention paid to balance the time required for model parameterization and accuracy of the representations from organ to stand scales. Such an approach particularly aims at overcoming the weak integration of interplant variability using the usual direct 3D measurement methods. Model This model is original in combining a turbid-medium-like envelope enclosing the volume occupied by vine shoots with the use of discrete geometric polygons representing leaves randomly located within this volume to represent plant structure. Reconstruction rules were adapted to capture the main determinants of grapevine shoot architecture and their variability. Using a simplified set of parameters, it was possible to describe (1) the 3D path of the main shoot, (2) the volume occupied by the foliage around this path and (3) the orientation of individual leaf surfaces. Model parameterization (estimation of the probability distribution for each parameter) was carried out for eight contrasting C × T pairs. Key Results and Conclusions The parameter values obtained in each situation were consistent with our knowledge of grapevine architecture. Quantitative assessments for the generated virtual scenes were carried out at the canopy and plant scales. Light interception efficiency and local variations of light transmittance within and between experimental plots were correctly simulated for all canopies studied. The approach predicted these key ecophysiological variables significantly more accurately than the classical complete digitization method with a limited number of plants. In addition, this model accurately reproduced the characteristics of a
Solar structure: Models and inferences from helioseismology
Guzik, J.A.
1998-12-31
In this review the author summarizes results published during approximately the least three years concerning the state of one-dimensional solar interior modeling. She discusses the effects of refinements to the input physics, motivated by improving the agreement between calculated and observed solar oscillation frequencies, or between calculated and inferred solar structure. She has omitted two- and three-dimensional aspects of the solar structure, such as the rotation profile, detailed modeling of turbulent convection, and magnetic fields, although further progress in refining solar interior models may require including such two- and three-dimensional dynamical effects.
NASA Astrophysics Data System (ADS)
Balachandran, Prasanna V.; Xue, Dezhen; Lookman, Turab
2016-04-01
One of the key impediments to the development of BaTiO3-based materials as candidates to replace toxic-Pb-based solid solutions is their relatively low ferroelectric Curie temperature (TC). Among many potential routes that are available to modify TC, ionic substitutions at the Ba and Ti sites remain the most common approach. Here, we perform density functional theory (DFT) calculations on a series of A TiO3 and Ba B O3 perovskites, where A =Ba , Ca, Sr, Pb, Cd, Sn, and Mg and B =Ti , Zr, Hf, and Sn. Our objective is to study the relative role of A and B cations in impacting the TC of the tetragonal (P 4 m m ) and rhombohedral (R 3 m ) ferroelectric phases in BaTiO3-based solid solutions, respectively. Using symmetry-mode analysis, we obtain a quantitative description of the relative contributions of various divalent (A ) and tetravalent (B ) cations to the ferroelectric distortions. Our results show that Ca, Pb, Cd, Sn, and Mg have large mode amplitudes for ferroelectric distortion in the tetragonal phase relative to Ba, whereas Sr suppresses the distortions. On the other hand, Zr, Hf, and Sn tetravalent cations severely suppress the ferroelectric distortion in the rhombohedral phase relative to Ti. In addition to symmetry modes, our calculated unit-cell volume also agrees with the experimental trends. We subsequently utilize the symmetry modes and unit-cell volumes as features within a machine learning approach to learn TC via an inference model and uncover trends that provide insights into the design of new high-TCBaTiO3 -based ferroelectrics. The inference model predicts CdTiO3-BaTiO3 solid solutions to have a higher TC and, therefore, we experimentally synthesized these solid solutions and measured their TC. Although the calculated mode strength for CdTiO3 in the tetragonal phase is even larger than that for PbTiO3, the TC of CdTiO3-BaTiO3 solid solutions in the tetragonal phase does not show any appreciable enhancement. Thus, CdTiO3-BaTiO3 does not follow the
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Using alien coins to test whether simple inference is Bayesian.
Cassey, Peter; Hawkins, Guy E; Donkin, Chris; Brown, Scott D
2016-03-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we asked people for prior and posterior inferences about the probability that 1 of 2 coins would generate certain outcomes. Most participants' inferences were inconsistent with Bayes' rule. Only in the simplest version of the task did the majority of participants adhere to Bayes' rule, but even in that case, there was a significant proportion that failed to do so. The current results highlight the importance of close quantitative comparisons between Bayesian inference and human data at the individual-subject level when evaluating models of cognition. (PsycINFO Database Record PMID:26461034
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Developing Young Students' Informal Inference Skills in Data Analysis
ERIC Educational Resources Information Center
Paparistodemou, Efi; Meletiou-Mavrotheris, Maria
2008-01-01
This paper focuses on developing students' informal inference skills, reporting on how a group of third grade students formulated and evaluated data-based inferences using the dynamic statistics data-visualization environment TinkerPlots[TM] (Konold & Miller, 2005), software specifically designed to meet the learning needs of students in the early…
Generalized Fiducial Inference for Binary Logistic Item Response Models.
Liu, Yang; Hannig, Jan
2016-06-01
Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data. PMID:26769340
Topics in inference and decision-making with partial knowledge
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
A Selective Overview of Variable Selection in High Dimensional Feature Space.
Fan, Jianqing; Lv, Jinchi
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
ERIC Educational Resources Information Center
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
sick: The Spectroscopic Inference Crank
NASA Astrophysics Data System (ADS)
Casey, Andrew R.
2016-03-01
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
Hanson, K.M.; Cunningham, G.S.
1996-04-01
The authors are developing a computer application, called the Bayes Inference Engine, to provide the means to make inferences about models of physical reality within a Bayesian framework. The construction of complex nonlinear models is achieved by a fully object-oriented design. The models are represented by a data-flow diagram that may be manipulated by the analyst through a graphical programming environment. Maximum a posteriori solutions are achieved using a general, gradient-based optimization algorithm. The application incorporates a new technique of estimating and visualizing the uncertainties in specific aspects of the model.
Januszyk, Michael; Gurtner, Geoffrey C
2011-01-01
The scope of biomedical research has expanded rapidly during the past several decades, and statistical analysis has become increasingly necessary to understand the meaning of large and diverse quantities of raw data. As such, a familiarity with this lexicon is essential for critical appraisal of medical literature. This article attempts to provide a practical overview of medical statistics, with an emphasis on the selection, application, and interpretation of specific tests. This includes a brief review of statistical theory and its nomenclature, particularly with regard to the classification of variables. A discussion of descriptive methods for data presentation is then provided, followed by an overview of statistical inference and significance analysis, and detailed treatment of specific statistical tests and guidelines for their interpretation. PMID:21200241
NASA Technical Reports Server (NTRS)
Stone, Peter H.; Yao, Mao-Sung
1990-01-01
A number of perpetual January simulations are carried out with a two-dimensional zonally averaged model employing various parameterizations of the eddy fluxes of heat (potential temperature) and moisture. The parameterizations are evaluated by comparing these results with the eddy fluxes calculated in a parallel simulation using a three-dimensional general circulation model with zonally symmetric forcing. The three-dimensional model's performance in turn is evaluated by comparing its results using realistic (nonsymmetric) boundary conditions with observations. Branscome's parameterization of the meridional eddy flux of heat and Leovy's parameterization of the meridional eddy flux of moisture simulate the seasonal and latitudinal variations of these fluxes reasonably well, while somewhat underestimating their magnitudes. New parameterizations of the vertical eddy fluxes are developed that take into account the enhancement of the eddy mixing slope in a growing baroclinic wave due to condensation, and also the effect of eddy fluctuations in relative humidity. The new parameterizations, when tested in the two-dimensional model, simulate the seasonal, latitudinal, and vertical variations of the vertical eddy fluxes quite well, when compared with the three-dimensional model, and only underestimate the magnitude of the fluxes by 10 to 20 percent.
Inference of Isoforms from Short Sequence Reads
NASA Astrophysics Data System (ADS)
Feng, Jianxing; Li, Wei; Jiang, Tao
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS and PAS information, especially for isoforms whose expression levels are significantly high.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Reliability of the Granger causality inference
NASA Astrophysics Data System (ADS)
Zhou, Douglas; Zhang, Yaoyu; Xiao, Yanyang; Cai, David
2014-04-01
How to characterize information flows in physical, biological, and social systems remains a major theoretical challenge. Granger causality (GC) analysis has been widely used to investigate information flow through causal interactions. We address one of the central questions in GC analysis, that is, the reliability of the GC evaluation and its implications for the causal structures extracted by this analysis. Our work reveals that the manner in which a continuous dynamical process is projected or coarse-grained to a discrete process has a profound impact on the reliability of the GC inference, and different sampling may potentially yield completely opposite inferences. This inference hazard is present for both linear and nonlinear processes. We emphasize that there is a hazard of reaching incorrect conclusions about network topologies, even including statistical (such as small-world or scale-free) properties of the networks, when GC analysis is blindly applied to infer the network topology. We demonstrate this using a small-world network for which a drastic loss of small-world attributes occurs in the reconstructed network using the standard GC approach. We further show how to resolve the paradox that the GC analysis seemingly becomes less reliable when more information is incorporated using finer and finer sampling. Finally, we present strategies to overcome these inference artifacts in order to obtain a reliable GC result.
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks
Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L.; Sweet, Robert A.; Wang, Jieru; Chen, Wei
2016-01-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer’s disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named “FastGGM”. PMID:26872036
KENNETH M. HANSON; JANE M. BOOKER
2000-09-08
The authors an uncertainty analysis of data taken using the Rossi technique, in which the horizontal oscilloscope sweep is driven sinusoidally in time ,while the vertical axis follows the signal amplitude. The analysis is done within a Bayesian framework. Complete inferences are obtained by tilting the Markov chain Monte Carlo technique, which produces random samples from the posterior probability distribution expressed in terms of the parameters.
Active inference and learning.
Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; O'Doherty, John; Pezzulo, Giovanni
2016-09-01
This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity. PMID:27375276
Estimating uncertainty of inference for validation
Booker, Jane M; Langenbrunner, James R; Hemez, Francois M; Ross, Timothy J
2010-09-30
first in a series of inference uncertainty estimations. While the methods demonstrated are primarily statistical, these do not preclude the use of nonprobabilistic methods for uncertainty characterization. The methods presented permit accurate determinations for validation and eventual prediction. It is a goal that these methods establish a standard against which best practice may evolve for determining degree of validation.
NASA Astrophysics Data System (ADS)
Cai, Jianqing; Grafarend, Erik W.
2007-02-01
In the deformation analysis with a 2-D (or planar and horizontal), symmetric rank-two deformation tensor in geosciences (geodesy, geophysics and geology), the eigenspace components of these random deformation tensors (principal components, principal directions) are of focal interest. With the new development of space-geodetic techniques, such as GPS, VLBI, SLR and DORIS, the components of deformation measures (such as the stress or strain tensor, etc.) can be estimated from their highly accurate regular measurement of positions and change rates and analysed by means of the proper statistical testing procedures. In this paper we begin with a review of the results of statistical inference of eigenspace components of the 2-D symmetric, rank-two random tensor (`random matrix'), that is, the best linear uniformly unbiased estimation (BLUUE) of the eigenspace elements and the best invariant quadratic uniformly unbiased estimate (BIQUUE) of its variance-covariance matrix. Then the geodynamic setting of the Earth and especially the selected investigated region-the central Mediterranean and Western Europe will be discussed. Thirdly, the ITRF sites are selected according to the history and quality of the ITRF realization series, and the related incremental velocities of selected ITRF sites are computed. Fourthly, the methods of derivation for the 2-D geodetic strain rates are introduced in order to obtain these strain rates from the incremental velocities. In the case study, both BLUUE and BIQUUE models as well as related hypothesis tests are applied to the eigenspace components of the 2-D strain rate tensor observations in the area of the central Mediterranean and Western Europe, as derived from the ITRF92 to ITRF2000 sequential station positions and velocities. The interpretation and comparison of these results with the geodynamic feature are followed. Furthermore the statistical inference of the eigenspace components provides us with not only the confidence regions of
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters. PMID:22407706
Towards Context Sensitive Information Inference.
ERIC Educational Resources Information Center
Song, D.; Bruza, P. D.
2003-01-01
Discusses information inference from a psychologistic stance and proposes an information inference mechanism that makes inferences via computations of information flow through an approximation of a conceptual space. Highlights include cognitive economics of information processing; context sensitivity; and query models for information retrieval.…
Marengo, Emilio; Robotti, Elisa; Bobba, Marco; Liparota, Maria Cristina; Rustichelli, Chiara; Zamò, Alberto; Chilosi, Marco; Righetti, Pier Giorgio
2006-02-01
Mantle cell lymphoma (MCL) cell lines have been difficult to generate, since only few have been described so far and even fewer have been thoroughly characterized. Among them, there is only one cell line, called GRANTA-519, which is well established and universally adopted for most lymphoma studies. We succeeded in establishing a new MCL cell line, called MAVER-1, from a leukemic MCL, and performed a thorough phenotypical, cytogenetical and molecular characterization of the cell line. In the present report, the phenotypic expression of GRANTA-519 and MAVER-1 cell lines has been compared and evaluated by a proteomic approach, exploiting 2-D map analysis. By univariate statistical analysis (Student's t-test, as commonly used in most commercial software packages), most of the protein spots were found to be identical between the two cell lines. Thirty spots were found to be unique for the GRANTA-519, whereas another 11 polypeptides appeared to be expressed only by the MAVER-1 cell line. A number of these spots could be identified by MS. These data were confirmed and expanded by multivariate statistical tools (principal component analysis and soft-independent model of class analogy) that allowed identification of a larger number of differently expressed spots. Multivariate statistical tools have the advantage of reducing the risk of false positives and of identifying spots that are significantly altered in terms of correlated expression rather than absolute expression values. It is thus suggested that, in future work in differential proteomic profiling, both univariate and multivariate statistical tools should be adopted. PMID:16372308
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
The NIFTY way of Bayesian signal inference
Selig, Marco
2014-12-05
We introduce NIFTY, 'Numerical Information Field Theory', a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D{sup 3}PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.
The NIFTy way of Bayesian signal inference
NASA Astrophysics Data System (ADS)
Selig, Marco
2014-12-01
We introduce NIFTy, "Numerical Information Field Theory", a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTy can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTy as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.
NASA Technical Reports Server (NTRS)
Wheeler, Kevin; Timucin, Dogan; Rabbette, Maura; Curry, Charles; Allan, Mark; Lvov, Nikolay; Clanton, Sam; Pilewskie, Peter
2002-01-01
The goal of visual inference programming is to develop a software framework data analysis and to provide machine learning algorithms for inter-active data exploration and visualization. The topics include: 1) Intelligent Data Understanding (IDU) framework; 2) Challenge problems; 3) What's new here; 4) Framework features; 5) Wiring diagram; 6) Generated script; 7) Results of script; 8) Initial algorithms; 9) Independent Component Analysis for instrument diagnosis; 10) Output sensory mapping virtual joystick; 11) Output sensory mapping typing; 12) Closed-loop feedback mu-rhythm control; 13) Closed-loop training; 14) Data sources; and 15) Algorithms. This paper is in viewgraph form.
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296
Cosmic statistics of statistics
NASA Astrophysics Data System (ADS)
Szapudi, István; Colombi, Stéphane; Bernardeau, Francis
1999-12-01
The errors on statistics measured in finite galaxy catalogues are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly non-linear to weakly non-linear scales. For non-linear functions of unbiased estimators, such as the cumulants, the phenomenon of cosmic bias is identified and computed. Since it is subdued by the cosmic errors in the range of applicability of the theory, correction for it is inconsequential. In addition, the method of Colombi, Szapudi & Szalay concerning sampling effects is generalized, adapting the theory for inhomogeneous galaxy catalogues. While previous work focused on the variance only, the present article calculates the cross-correlations between moments and connected moments as well for a statistically complete description. The final analytic formulae representing the full theory are explicit but somewhat complicated. Therefore we have made available a fortran program capable of calculating the described quantities numerically (for further details e-mail SC at colombi@iap.fr). An important special case is the evaluation of the errors on the two-point correlation function, for which this should be more accurate than any method put forward previously. This tool will be immensely useful in the future for assessing the precision of measurements from existing catalogues, as well as aiding the design of new galaxy surveys. To illustrate the applicability of the results and to explore the numerical aspects of the theory qualitatively and quantitatively, the errors and cross-correlations are predicted under a wide range of assumptions for the future Sloan Digital Sky Survey. The principal results concerning the cumulants ξ, Q3 and Q4 is that
Dynamic colloidal assembly pathways via low dimensional models.
Yang, Yuguang; Thyagarajan, Raghuram; Ford, David M; Bevan, Michael A
2016-05-28
Here we construct a low-dimensional Smoluchowski model for electric field mediated colloidal crystallization using Brownian dynamic simulations, which were previously matched to experiments. Diffusion mapping is used to infer dimensionality and confirm the use of two order parameters, one for degree of condensation and one for global crystallinity. Free energy and diffusivity landscapes are obtained as the coefficients of a low-dimensional Smoluchowski equation to capture the thermodynamics and kinetics of microstructure evolution. The resulting low-dimensional model quantitatively captures the dynamics of different assembly pathways between fluid, polycrystal, and single crystals states, in agreement with the full N-dimensional data as characterized by first passage time distributions. Numerical solution of the low-dimensional Smoluchowski equation reveals statistical properties of the dynamic evolution of states vs. applied field amplitude and system size. The low-dimensional Smoluchowski equation and associated landscapes calculated here can serve as models for predictive control of electric field mediated assembly of colloidal ensembles into two-dimensional crystalline objects. PMID:27250328
Dynamic colloidal assembly pathways via low dimensional models
NASA Astrophysics Data System (ADS)
Yang, Yuguang; Thyagarajan, Raghuram; Ford, David M.; Bevan, Michael A.
2016-05-01
Here we construct a low-dimensional Smoluchowski model for electric field mediated colloidal crystallization using Brownian dynamic simulations, which were previously matched to experiments. Diffusion mapping is used to infer dimensionality and confirm the use of two order parameters, one for degree of condensation and one for global crystallinity. Free energy and diffusivity landscapes are obtained as the coefficients of a low-dimensional Smoluchowski equation to capture the thermodynamics and kinetics of microstructure evolution. The resulting low-dimensional model quantitatively captures the dynamics of different assembly pathways between fluid, polycrystal, and single crystals states, in agreement with the full N-dimensional data as characterized by first passage time distributions. Numerical solution of the low-dimensional Smoluchowski equation reveals statistical properties of the dynamic evolution of states vs. applied field amplitude and system size. The low-dimensional Smoluchowski equation and associated landscapes calculated here can serve as models for predictive control of electric field mediated assembly of colloidal ensembles into two-dimensional crystalline objects.
Inferring identify from DNA profile evidence.
Balding, D J; Donnelly, P
1995-12-01
The controversy over the interpretation of DNA profile evidence in forensic identification can be attributed in part to confusion over the mode(s) of statistical inference appropriate to this setting. Although there has been substantial discussion in the literature of, for example, the role of population genetics issues, few authors have made explicit the inferential framework which underpins their arguments. This lack of clarity has led both to unnecessary debates over ill-posed or inappropriate questions and to the neglect of some issues which can have important consequences. We argue that the mode of statistical inference which seems to underlie the arguments of some authors, based on a hypothesis testing framework, is not appropriate for forensic identification. We propose instead a logically coherent framework in which, for example, the roles both of the population genetics issues and of the nonscientific evidence in a case are incorporated. Our analysis highlights several widely held misconceptions in the DNA profiling debate. For example, the profile frequency is not directly relevant to forensic inference. Further, very small match probabilities may in some settings be consistent with acquittal. Although DNA evidence is typically very strong, our analysis of the coherent approach highlights situations which can arise in practice where alternative methods for assessing DNA evidence may be misleading. PMID:8524840
Inferred Lunar Boulder Distributions at Decimeter Scales
NASA Technical Reports Server (NTRS)
Baloga, S. M.; Glaze, L. S.; Spudis, P. D.
2012-01-01
Block size distributions of impact deposits on the Moon are diagnostic of the impact process and environmental effects, such as target lithology and weathering. Block size distributions are also important factors in trafficability, habitability, and possibly the identification of indigenous resources. Lunar block sizes have been investigated for many years for many purposes [e.g., 1-3]. An unresolved issue is the extent to which lunar block size distributions can be extrapolated to scales smaller than limits of resolution of direct measurement. This would seem to be a straightforward statistical application, but it is complicated by two issues. First, the cumulative size frequency distribution of observable boulders rolls over due to resolution limitations at the small end. Second, statistical regression provides the best fit only around the centroid of the data [4]. Confidence and prediction limits splay away from the best fit at the endpoints resulting in inferences in the boulder density at the CPR scale that can differ by many orders of magnitude [4]. These issues were originally investigated by Cintala and McBride [2] using Surveyor data. The objective of this study was to determine whether the measured block size distributions from Lunar Reconnaissance Orbiter Camera - Narrow Angle Camera (LROC-NAC) images (m-scale resolution) can be used to infer the block size distribution at length scales comparable to Mini-RF Circular Polarization Ratio (CPR) scales, nominally taken as 10 cm. This would set the stage for assessing correlations of inferred block size distributions with CPR returns [6].
Evolutionary inferences from the analysis of exchangeability
Hendry, Andrew P.; Kaeuffer, Renaud; Crispo, Erika; Peichel, Catherine L.; Bolnick, Daniel I.
2013-01-01
Evolutionary inferences are usually based on statistical models that compare mean genotypes and phenotypes (or their frequencies) among populations. An alternative is to use the actual distribution of genotypes and phenotypes to infer the “exchangeability” of individuals among populations. We illustrate this approach by using discriminant functions on principal components to classify individuals among paired lake and stream populations of threespine stickleback in each of six independent watersheds. Classification based on neutral and non-neutral microsatellite markers was highest to the population of origin and next-highest to populations in the same watershed. These patterns are consistent with the influence of historical contingency (separate colonization of each watershed) and subsequent gene flow (within but not between watersheds). In comparison to this low genetic exchangeability, ecological (diet) and morphological (trophic and armor traits) exchangeability was relatively high – particularly among populations from similar habitats. These patterns reflect the role of natural selection in driving parallel changes adaptive changes when independent populations colonize similar habitats. Importantly, however, substantial non-parallelism was also evident. Our results show that analyses based on exchangeability can confirm inferences based on statistical analyses of means or frequencies, while also refining insights into the drivers of – and constraints on – evolutionary diversification. PMID:24299398
Circular inferences in schizophrenia.
Jardri, Renaud; Denève, Sophie
2013-11-01
A considerable number of recent experimental and computational studies suggest that subtle impairments of excitatory to inhibitory balance or regulation are involved in many neurological and psychiatric conditions. The current paper aims to relate, specifically and quantitatively, excitatory to inhibitory imbalance with psychotic symptoms in schizophrenia. Considering that the brain constructs hierarchical causal models of the external world, we show that the failure to maintain the excitatory to inhibitory balance results in hallucinations as well as in the formation and subsequent consolidation of delusional beliefs. Indeed, the consequence of excitatory to inhibitory imbalance in a hierarchical neural network is equated to a pathological form of causal inference called 'circular belief propagation'. In circular belief propagation, bottom-up sensory information and top-down predictions are reverberated, i.e. prior beliefs are misinterpreted as sensory observations and vice versa. As a result, these predictions are counted multiple times. Circular inference explains the emergence of erroneous percepts, the patient's overconfidence when facing probabilistic choices, the learning of 'unshakable' causal relationships between unrelated events and a paradoxical immunity to perceptual illusions, which are all known to be associated with schizophrenia. PMID:24065721
Moment inference from tomograms
Day-Lewis, F. D.; Chen, Y.; Singha, K.
2007-01-01
Time-lapse geophysical tomography can provide valuable qualitative insights into hydrologic transport phenomena associated with aquifer dynamics, tracer experiments, and engineered remediation. Increasingly, tomograms are used to infer the spatial and/or temporal moments of solute plumes; these moments provide quantitative information about transport processes (e.g., advection, dispersion, and rate-limited mass transfer) and controlling parameters (e.g., permeability, dispersivity, and rate coefficients). The reliability of moments calculated from tomograms is, however, poorly understood because classic approaches to image appraisal (e.g., the model resolution matrix) are not directly applicable to moment inference. Here, we present a semi-analytical approach to construct a moment resolution matrix based on (1) the classic model resolution matrix and (2) image reconstruction from orthogonal moments. Numerical results for radar and electrical-resistivity imaging of solute plumes demonstrate that moment values calculated from tomograms depend strongly on plume location within the tomogram, survey geometry, regularization criteria, and measurement error. Copyright 2007 by the American Geophysical Union.
NASA Astrophysics Data System (ADS)
Graham, D. B.; Cairns, Iver H.; Skjaeraasen, O.; Robinson, P. A.
2012-02-01
The temperature ratio Ti/Te of ions to electrons affects both the ion-damping rate and the ion-acoustic speed in plasmas. The effects of changing the ion-damping rate and ion-acoustic speed are investigated for electrostatic strong turbulence and electromagnetic strong turbulence in three dimensions. When ion damping is strong, density wells relax in place and act as nucleation sites for the formation of new wave packets. In this case, the density perturbations are primarily density wells supported by the ponderomotive force. For weak ion damping, corresponding to low Ti/Te, ion-acoustic waves are launched radially outwards when wave packets dissipate at burnout, thereby increasing the level of density perturbations in the system and thus raising the level of scattering of Langmuir waves off density perturbations. Density wells no longer relax in place so renucleation at recent collapse sites no longer occurs, instead wave packets form in background low density regions, such as superpositions of troughs of propagating ion-acoustic waves. This transition is found to occur at Ti/Te ≈ 0.1. The change in behavior with Ti/Te is shown to change the bulk statistical properties, scaling behavior, spectra, and field statistics of strong turbulence. For Ti/Te>rsim0.1, the electrostatic results approach the predictions of the two-component model of Robinson and Newman, and good agreement is found for Ti/Te>rsim0.15.
NASA Astrophysics Data System (ADS)
Stone, Peter H.; Yao, Mao-Sung
1990-07-01
A number of perpetual January simulations are carried out with a two-dimensional (2-D) zonally averaged model employing various parameterizations of the eddy fluxes of heat (potential temperature) and moisture. The parameterizations are evaluated by comparing these results with the eddy fluxes calculated in a parallel simulation using a three-dimensional (3-D) general circulation model with zonally symmetric forcing. The 3-D model's performance in turn is evaluated by comparing its results using realistic (nonsymmetric) boundary conditions with observations.Branscome's parameterization of the meridional eddy flux of heat and Leovy's parameterization of the meridional eddy flux of moisture simulate the seasonal and latitudinal variations of these fluxes reasonably well, while somewhat underestimating their magnitudes. In particular, Branscome's parameterization underestimates the vertically integrated flux of heat by about 30%, mainly because it misses out the secondary peak in this flux near the tropopause; and Leovy's parameterization of the meridional eddy flux of moisture underestimates the magnitude of this flux by about 20%. The analogous parameterizations of the vertical eddy fluxes of heat and moisture are found to perform much more poorly, i.e., they give fluxes only one quarter to one half as strong as those calculated in the 3-D model. New parameterizations of the vertical eddy fluxes are developed that take into account the enhancement of the eddy mixing slope in a growing baroclinic wave due to condensation, and also the effect of eddy fluctuations in relative humidity. The new parameterizations, when tested in the 2-D model, simulate the seasonal, latitudinal, and vertical variations of the vertical eddy fluxes quite well, when compared with the 3-D model, and only underestimate the magnitude of the fluxes by 10% to 20%.
Statistical Inference for a Ratio of Dispersions Using Paired Samples.
ERIC Educational Resources Information Center
Bonett, Douglas G.; Seier, Edith
2003-01-01
Derived a confidence interval for a ratio of correlated mean absolute deviations. Simulation results show that it performs well in small sample sizes across realistically nonnormal distributions and that it is almost as powerful as the most powerful test examined by R. Wilcox (1990). (SLD)
Postscript: Bayesian Statistical Inference in Psychology: Comment on Trafimow (2003)
ERIC Educational Resources Information Center
Lee, Michael D.; Wagenmakers, Eric-Jan
2005-01-01
This paper comments on the response offered by Trafimow on Lee and Wagenmakers comments on Trafimow's original article. It seems our comment should have made it clear that the objective Bayesian approach we advocate views probabilities neither as relative frequencies nor as belief states, but as degrees of plausibility assigned to propositions in…
Statistical Inference-Based Cache Management for Mobile Learning
ERIC Educational Resources Information Center
Li, Qing; Zhao, Jianmin; Zhu, Xinzhong
2009-01-01
Supporting efficient data access in the mobile learning environment is becoming a hot research problem in recent years, and the problem becomes tougher when the clients are using light-weight mobile devices such as cell phones whose limited storage space prevents the clients from holding a large cache. A practical solution is to store the cache…
Statistical Inference and Spatial Patterns in Correlates of IQ
ERIC Educational Resources Information Center
Hassall, Christopher; Sherratt, Thomas N.
2011-01-01
Cross-national comparisons of IQ have become common since the release of a large dataset of international IQ scores. However, these studies have consistently failed to consider the potential lack of independence of these scores based on spatial proximity. To demonstrate the importance of this omission, we present a re-evaluation of several…
Drawing Statistical Inferences from Historical Census Data, 1850–1950
DAVERN, MICHAEL; RUGGLES, STEVEN; SWENSON, TAMI; ALEXANDER, J. TRENT; OAKES, J. MICHAEL
2009-01-01
Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1965, 1992). Such data can yield standard error estimates that differ dramatically from those derived from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the Integrated Public Use Microdata Series (IPUMS) project from 1850 to 1950 in order to determine (1) the impact of sample design on standard error estimates, and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation, and then we apply this approach to the 1850–1870 and 1900–1950 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples and should be applied in research analyses that have the potential for substantial clustering effects. PMID:19771946
Another Look At The Canon of Plausible Inference
NASA Astrophysics Data System (ADS)
Solana-Ortega, Alberto; Solana, Vicente
2005-11-01
Systematic study of plausible inference is very recent. Axiomatics have been traditionally limited to the development of uninterpreted pure calculi for comparing individual inferences, ignoring the need of formalisms to solve each of these inferences and leaving the interpretation and application of such calculi to ad hoc statistical criteria which are open to inconsistencies. Here we defend a different viewpoint, regarding plausible inference in a holistic manner. Specifically we consider that all tasks involved in it, including the formalization of languages in which to pose problems, the definitions and axiomatics leading to calculation rules and those for deriving inference procedures or assignment rules, ought to be based on common grounds. For this purpose a set of elementary requirements establishing desirable properties so fundamental any theory of scientific inference should satisfy is proposed under the name of plausible inference canon. Its logical status as an extramathematical foundation is investigated, together with the different roles it plays as constructive guideline, standard for contrasting frameworks or normative stipulation. We also highlight the novelties it introduces with respect to similar proposals by other authors. In particular we concentrate on those aspects of the canon related to the critical issue of adequately incorporating basic evidential knowledge to inference.
Causal Inference in Public Health
Glass, Thomas A.; Goodman, Steven N.; Hernán, Miguel A.; Samet, Jonathan M.
2014-01-01
Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action’s consequences rather than the less precise notion of a risk factor’s causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world. PMID:23297653
Fast inference of ill-posed problems within a convex space
NASA Astrophysics Data System (ADS)
Fernandez-de-Cossio-Diaz, J.; Mulet, R.
2016-07-01
In multiple scientific and technological applications we face the problem of having low dimensional data to be justified by a linear model defined in a high dimensional parameter space. The difference in dimensionality makes the problem ill-defined: the model is consistent with the data for many values of its parameters. The objective is to find the probability distribution of parameter values consistent with the data, a problem that can be cast as the exploration of a high dimensional convex polytope. In this work we introduce a novel algorithm to solve this problem efficiently. It provides results that are statistically indistinguishable from currently used numerical techniques while its running time scales linearly with the system size. We show that the algorithm performs robustly in many abstract and practical applications. As working examples we simulate the effects of restricting reaction fluxes on the space of feasible phenotypes of a genome scale Escherichia coli metabolic network and infer the traffic flow between origin and destination nodes in a real communication network.
Maximum caliber inference of nonequilibrium processes.
Otten, Moritz; Stock, Gerhard
2010-07-21
Thirty years ago, Jaynes suggested a general theoretical approach to nonequilibrium statistical mechanics, called maximum caliber (MaxCal) [Annu. Rev. Phys. Chem. 31, 579 (1980)]. MaxCal is a variational principle for dynamics in the same spirit that maximum entropy is a variational principle for equilibrium statistical mechanics. Motivated by the success of maximum entropy inference methods for equilibrium problems, in this work the MaxCal formulation is applied to the inference of nonequilibrium processes. That is, given some time-dependent observables of a dynamical process, one constructs a model that reproduces these input data and moreover, predicts the underlying dynamics of the system. For example, the observables could be some time-resolved measurements of the folding of a protein, which are described by a few-state model of the free energy landscape of the system. MaxCal then calculates the probabilities of an ensemble of trajectories such that on average the data are reproduced. From this probability distribution, any dynamical quantity of the system can be calculated, including population probabilities, fluxes, or waiting time distributions. After briefly reviewing the formalism, the practical numerical implementation of MaxCal in the case of an inference problem is discussed. Adopting various few-state models of increasing complexity, it is demonstrated that the MaxCal principle indeed works as a practical method of inference: The scheme is fairly robust and yields correct results as long as the input data are sufficient. As the method is unbiased and general, it can deal with any kind of time dependency such as oscillatory transients and multitime decays. PMID:20649320
Bayesian inference in geomagnetism
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
NASA Astrophysics Data System (ADS)
Schwinger, J.; Elbern, H.
2010-09-01
Chemical state analyses of the atmosphere based on data assimilation may be degraded by inconsistent covariances of background and observation errors. An efficient method to calculate consistency diagnostics for background and observation errors in observation space is applied to analyses of the four-dimensional variational stratospheric chemistry data assimilation system SACADA (Synoptic Analysis of Chemical Constituents by Advanced Data Assimilation). A background error covariance model for the assimilation of Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) ozone retrievals is set up and optimized. It is shown that a significant improvement of the assimilation system performance is attained through the use of this covariance model compared to a simple covariance formulation, which assumes background errors to be a fixed fraction of the field value. The forecast skill, measured by the distance between the model forecast and MIPAS observations, is shown to improve. Further, an evaluation of analyses with independent data from the Halogen Observation Experiment (HALOE), the Stratospheric Aerosol and Gas Experiment II (SAGE II), and ozone sondes reveals that the standard deviation of ozone analyses with respect to these instruments is reduced throughout the middle stratosphere. Compared to the impact of background error variances on analysis quality, it is found that the precise specification of spatial background error correlations appears to be less critical if observations are spatially and temporally dense. Results indicate that ozone forecast errors of a state of the art stratospheric chemistry assimilation system are of the same order of magnitude as MIPAS observation errors.
Unified inference for sparse and dense longitudinal models.
Kim, Seonjin; Zhao, Zhibiao
2013-03-01
In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing estimate of the mean function, the convergence rates and limiting variance functions are different under the two scenarios. The latter phenomenon poses challenges for statistical inference as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop self-normalization based methods that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods. PMID:24966413
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
Inferring sparse networks for noisy transient processes.
Tran, Hoang M; Bukkapatnam, Satish T S
2016-01-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the l1-min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of l1-min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues. PMID:26916813
Inferring sparse networks for noisy transient processes
NASA Astrophysics Data System (ADS)
Tran, Hoang M.; Bukkapatnam, Satish T. S.
2016-02-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.
Inferring sparse networks for noisy transient processes
Tran, Hoang M.; Bukkapatnam, Satish T.S.
2016-01-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues. PMID:26916813
Topology based Kernels with Application to Inference Problems in Alzheimer’s disease
Pachauri, Deepti; Hinrichs, Chris; Chung, Moo K.; Johnson, Sterling C.; Singh, Vikas
2011-01-01
Alzheimer’s disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the Support Vector Machine (SVM) framework (or more generally kernel based methods). Most of these require, as a first step, specification of a kernel matrix between input examples (i.e., images). The inner product between images Ii and Ij in a feature space can generally be written in closed form, and so it is convenient to treat as “given”. However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the ADNI study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning. PMID:21536520
Shi, Runhua; McLarty, Jerry W
2009-10-01
In this article, we introduced basic concepts of statistics, type of distributions, and descriptive statistics. A few examples were also provided. The basic concepts presented herein are only a fraction of the concepts related to descriptive statistics. Also, there are many commonly used distributions not presented herein, such as Poisson distributions for rare events and exponential distributions, F distributions, and logistic distributions. More information can be found in many statistics books and publications. PMID:19891281
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.
ERIC Educational Resources Information Center
Callamaras, Peter
1983-01-01
This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)
ERIC Educational Resources Information Center
Meyer, Donald L.
Bayesian statistical methodology and its possible uses in the behavioral sciences are discussed in relation to the solution of problems in both the use and teaching of fundamental statistical methods, including confidence intervals, significance tests, and sampling. The Bayesian model explains these statistical methods and offers a consistent…
Kernel approximate Bayesian computation in population genetic inferences.
Nakagome, Shigeki; Fukumizu, Kenji; Mano, Shuhei
2013-12-01
Approximate Bayesian computation (ABC) is a likelihood-free approach for Bayesian inferences based on a rejection algorithm method that applies a tolerance of dissimilarity between summary statistics from observed and simulated data. Although several improvements to the algorithm have been proposed, none of these improvements avoid the following two sources of approximation: 1) lack of sufficient statistics: sampling is not from the true posterior density given data but from an approximate posterior density given summary statistics; and 2) non-zero tolerance: sampling from the posterior density given summary statistics is achieved only in the limit of zero tolerance. The first source of approximation can be improved by adding a summary statistic, but an increase in the number of summary statistics could introduce additional variance caused by the low acceptance rate. Consequently, many researchers have attempted to develop techniques to choose informative summary statistics. The present study evaluated the utility of a kernel-based ABC method [Fukumizu, K., L. Song and A. Gretton (2010): "Kernel Bayes' rule: Bayesian inference with positive definite kernels," arXiv, 1009.5736 and Fukumizu, K., L. Song and A. Gretton (2011): "Kernel Bayes' rule. Advances in Neural Information Processing Systems 24." In: J. Shawe-Taylor and R. S. Zemel and P. Bartlett and F. Pereira and K. Q. Weinberger, (Eds.), pp. 1549-1557., NIPS 24: 1549-1557] for complex problems that demand many summary statistics. Specifically, kernel ABC was applied to population genetic inference. We demonstrate that, in contrast to conventional ABCs, kernel ABC can incorporate a large number of summary statistics while maintaining high performance of the inference. PMID:24150124
Causal Inference in Retrospective Studies.
ERIC Educational Resources Information Center
Holland, Paul W.; Rubin, Donald B.
1988-01-01
The problem of drawing causal inferences from retrospective case-controlled studies is considered. A model for causal inference in prospective studies is applied to retrospective studies. Limitations of case-controlled studies are formulated concerning relevant parameters that can be estimated in such studies. A coffee-drinking/myocardial…
Improving Inferences from Multiple Methods.
ERIC Educational Resources Information Center
Shotland, R. Lance; Mark, Melvin M.
1987-01-01
Multiple evaluation methods (MEMs) can cause an inferential challenge, although there are strategies to strengthen inferences. Practical and theoretical issues involved in the use by social scientists of MEMs, three potential problems in drawing inferences from MEMs, and short- and long-term strategies for alleviating these problems are outlined.…
Causal Inference and Developmental Psychology
ERIC Educational Resources Information Center
Foster, E. Michael
2010-01-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…
Learning to Observe "and" Infer
ERIC Educational Resources Information Center
Hanuscin, Deborah L.; Park Rogers, Meredith A.
2008-01-01
Researchers describe the need for students to have multiple opportunities and social interaction to learn about the differences between observation and inference and their role in developing scientific explanations (Harlen 2001; Simpson 2000). Helping children develop their skills of observation and inference in science while emphasizing the…
INFERRING THE ECCENTRICITY DISTRIBUTION
Hogg, David W.; Bovy, Jo; Myers, Adam D.
2010-12-20
Standard maximum-likelihood estimators for binary-star and exoplanet eccentricities are biased high, in the sense that the estimated eccentricity tends to be larger than the true eccentricity. As with most non-trivial observables, a simple histogram of estimated eccentricities is not a good estimate of the true eccentricity distribution. Here, we develop and test a hierarchical probabilistic method for performing the relevant meta-analysis, that is, inferring the true eccentricity distribution, taking as input the likelihood functions for the individual star eccentricities, or samplings of the posterior probability distributions for the eccentricities (under a given, uninformative prior). The method is a simple implementation of a hierarchical Bayesian model; it can also be seen as a kind of heteroscedastic deconvolution. It can be applied to any quantity measured with finite precision-other orbital parameters, or indeed any astronomical measurements of any kind, including magnitudes, distances, or photometric redshifts-so long as the measurements have been communicated as a likelihood function or a posterior sampling.
Inferring the Eccentricity Distribution
NASA Astrophysics Data System (ADS)
Hogg, David W.; Myers, Adam D.; Bovy, Jo
2010-12-01
Standard maximum-likelihood estimators for binary-star and exoplanet eccentricities are biased high, in the sense that the estimated eccentricity tends to be larger than the true eccentricity. As with most non-trivial observables, a simple histogram of estimated eccentricities is not a good estimate of the true eccentricity distribution. Here, we develop and test a hierarchical probabilistic method for performing the relevant meta-analysis, that is, inferring the true eccentricity distribution, taking as input the likelihood functions for the individual star eccentricities, or samplings of the posterior probability distributions for the eccentricities (under a given, uninformative prior). The method is a simple implementation of a hierarchical Bayesian model; it can also be seen as a kind of heteroscedastic deconvolution. It can be applied to any quantity measured with finite precision—other orbital parameters, or indeed any astronomical measurements of any kind, including magnitudes, distances, or photometric redshifts—so long as the measurements have been communicated as a likelihood function or a posterior sampling.
Social Inference Through Technology
NASA Astrophysics Data System (ADS)
Oulasvirta, Antti
Awareness cues are computer-mediated, real-time indicators of people’s undertakings, whereabouts, and intentions. Already in the mid-1970 s, UNIX users could use commands such as “finger” and “talk” to find out who was online and to chat. The small icons in instant messaging (IM) applications that indicate coconversants’ presence in the discussion space are the successors of “finger” output. Similar indicators can be found in online communities, media-sharing services, Internet relay chat (IRC), and location-based messaging applications. But presence and availability indicators are only the tip of the iceberg. Technological progress has enabled richer, more accurate, and more intimate indicators. For example, there are mobile services that allow friends to query and follow each other’s locations. Remote monitoring systems developed for health care allow relatives and doctors to assess the wellbeing of homebound patients (see, e.g., Tang and Venables 2000). But users also utilize cues that have not been deliberately designed for this purpose. For example, online gamers pay attention to other characters’ behavior to infer what the other players are like “in real life.” There is a common denominator underlying these examples: shared activities rely on the technology’s representation of the remote person. The other human being is not physically present but present only through a narrow technological channel.
Inference from aging information.
de Oliveira, Evaldo Araujo; Caticha, Nestor
2010-06-01
For many learning tasks the duration of the data collection can be greater than the time scale for changes of the underlying data distribution. The question we ask is how to include the information that data are aging. Ad hoc methods to achieve this include the use of validity windows that prevent the learning machine from making inferences based on old data. This introduces the problem of how to define the size of validity windows. In this brief, a new adaptive Bayesian inspired algorithm is presented for learning drifting concepts. It uses the analogy of validity windows in an adaptive Bayesian way to incorporate changes in the data distribution over time. We apply a theoretical approach based on information geometry to the classification problem and measure its performance in simulations. The uncertainty about the appropriate size of the memory windows is dealt with in a Bayesian manner by integrating over the distribution of the adaptive window size. Thus, the posterior distribution of the weights may develop algebraic tails. The learning algorithm results from tracking the mean and variance of the posterior distribution of the weights. It was found that the algebraic tails of this posterior distribution give the learning algorithm the ability to cope with an evolving environment by permitting the escape from local traps. PMID:20421181
Prado, R A; Santos, C R; Kato, D I; Murakami, M T; Viviani, V R
2016-05-11
Beetle luciferases, the enzymes responsible for bioluminescence, are special cases of CoA-ligases which have acquired a novel oxygenase activity, offering elegant models to investigate the structural origin of novel catalytic functions in enzymes. What the original function of their ancestors was, and how the new oxygenase function emerged leading to bioluminescence remains unclear. To address these questions, we solved the crystal structure of a recently cloned Malpighian luciferase-like enzyme of unknown function from Zophobas morio mealworms, which displays weak luminescence with ATP and the xenobiotic firefly d-luciferin. The three dimensional structure of the N-terminal domain showed the expected general fold of CoA-ligases, with a unique carboxylic substrate binding pocket, permitting the binding and CoA-thioesterification activity with a broad range of carboxylic substrates, including short-, medium-chain and aromatic acids, indicating a generalist function consistent with a xenobiotic-ligase. The thioesterification activity with l-luciferin, but not with the d-enantiomer, confirms that the oxygenase activity emerged from a stereoselective impediment of the thioesterification reaction with the latter, favoring the alternative chemiluminescence oxidative reaction. The structure and site-directed mutagenesis support the involvement of the main-chain amide carbonyl of the invariant glycine G323 as the catalytic base for luciferin C4 proton abstraction during the oxygenase activity in this enzyme and in beetle luciferases (G343). PMID:27101527
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
CAUSAL INFERENCE IN BIOLOGY NETWORKS WITH INTEGRATED BELIEF PROPAGATION
CHANG, RUI; KARR, JONATHAN R; SCHADT, ERIC E
2014-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the ‘fitness’ of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot. PMID:25592596
Active Inference for Binary Symmetric Hidden Markov Models
NASA Astrophysics Data System (ADS)
Allahverdyan, Armen E.; Galstyan, Aram
2015-10-01
We consider active maximum a posteriori (MAP) inference problem for hidden Markov models (HMM), where, given an initial MAP estimate of the hidden sequence, we select to label certain states in the sequence to improve the estimation accuracy of the remaining states. We focus on the binary symmetric HMM, and employ its known mapping to 1d Ising model in random fields. From the statistical physics viewpoint, the active MAP inference problem reduces to analyzing the ground state of the 1d Ising model under modified external fields. We develop an analytical approach and obtain a closed form solution that relates the expected error reduction to model parameters under the specified active inference scheme. We then use this solution to determine most optimal active inference scheme in terms of error reduction, and examine the relation of those schemes to heuristic principles of uncertainty reduction and solution unicity.
Methods for causal inference from gene perturbation experiments and validation.
Meinshausen, Nicolai; Hauser, Alain; Mooij, Joris M; Peters, Jonas; Versteeg, Philip; Bühlmann, Peter
2016-07-01
Inferring causal effects from observational and interventional data is a highly desirable but ambitious goal. Many of the computational and statistical methods are plagued by fundamental identifiability issues, instability, and unreliable performance, especially for large-scale systems with many measured variables. We present software and provide some validation of a recently developed methodology based on an invariance principle, called invariant causal prediction (ICP). The ICP method quantifies confidence probabilities for inferring causal structures and thus leads to more reliable and confirmatory statements for causal relations and predictions of external intervention effects. We validate the ICP method and some other procedures using large-scale genome-wide gene perturbation experiments in Saccharomyces cerevisiae The results suggest that prediction and prioritization of future experimental interventions, such as gene deletions, can be improved by using our statistical inference techniques. PMID:27382150
Methods for causal inference from gene perturbation experiments and validation
Meinshausen, Nicolai; Hauser, Alain; Mooij, Joris M.; Peters, Jonas; Versteeg, Philip; Bühlmann, Peter
2016-01-01
Inferring causal effects from observational and interventional data is a highly desirable but ambitious goal. Many of the computational and statistical methods are plagued by fundamental identifiability issues, instability, and unreliable performance, especially for large-scale systems with many measured variables. We present software and provide some validation of a recently developed methodology based on an invariance principle, called invariant causal prediction (ICP). The ICP method quantifies confidence probabilities for inferring causal structures and thus leads to more reliable and confirmatory statements for causal relations and predictions of external intervention effects. We validate the ICP method and some other procedures using large-scale genome-wide gene perturbation experiments in Saccharomyces cerevisiae. The results suggest that prediction and prioritization of future experimental interventions, such as gene deletions, can be improved by using our statistical inference techniques. PMID:27382150
Fermions from classical statistics
Wetterich, C.
2010-12-15
We describe fermions in terms of a classical statistical ensemble. The states {tau} of this ensemble are characterized by a sequence of values one or zero or a corresponding set of two-level observables. Every classical probability distribution can be associated to a quantum state for fermions. If the time evolution of the classical probabilities p{sub {tau}} amounts to a rotation of the wave function q{sub {tau}}(t)={+-}{radical}(p{sub {tau}}(t)), we infer the unitary time evolution of a quantum system of fermions according to a Schroedinger equation. We establish how such classical statistical ensembles can be mapped to Grassmann functional integrals. Quantum field theories for fermions arise for a suitable time evolution of classical probabilities for generalized Ising models.
Waller, Lance A.
2008-01-01
The three papers included in this special issue represent a set of presentations in an invited session on disease ecology at the 2005 Spring Meeting of the Eastern North American Region of the International Biometric Society. The papers each address statistical estimation and inference for particular components of different disease processes and, taken together, illustrate the breadth of statistical issues arising in the study of the ecology and public health impact of disease. As an introduction, we provide a very brief overview of the area of “disease ecology”, a variety of synonyms addressing different aspects of disease ecology, and present a schematic structure illustrating general components of the underlying disease process, data collection issues, and different disciplinary perspectives ranging from microbiology to public health surveillance. PMID:19081740
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
Kogalovskii, M.R.
1995-03-01
This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.
Smith, Alwyn
1969-01-01
This paper is based on an analysis of questionnaires sent to the health ministries of Member States of WHO asking for information about the extent, nature, and scope of morbidity statistical information. It is clear that most countries collect some statistics of morbidity and many countries collect extensive data. However, few countries relate their collection to the needs of health administrators for information, and many countries collect statistics principally for publication in annual volumes which may appear anything up to 3 years after the year to which they refer. The desiderata of morbidity statistics may be summarized as reliability, representativeness, and relevance to current health problems. PMID:5306722
Ensemble Inference and Inferability of Gene Regulatory Networks
Ud-Dean, S. M. Minhaz; Gunawan, Rudiyanto
2014-01-01
The inference of gene regulatory network (GRN) from gene expression data is an unsolved problem of great importance. This inference has been stated, though not proven, to be underdetermined implying that there could be many equivalent (indistinguishable) solutions. Motivated by this fundamental limitation, we have developed new framework and algorithm, called TRaCE, for the ensemble inference of GRNs. The ensemble corresponds to the inherent uncertainty associated with discriminating direct and indirect gene regulations from steady-state data of gene knock-out (KO) experiments. We applied TRaCE to analyze the inferability of random GRNs and the GRNs of E. coli and yeast from single- and double-gene KO experiments. The results showed that, with the exception of networks with very few edges, GRNs are typically not inferable even when the data are ideal (unbiased and noise-free). Finally, we compared the performance of TRaCE with top performing methods of DREAM4 in silico network inference challenge. PMID:25093509
Inferring the Gravitational Potential of the Milky Way
NASA Astrophysics Data System (ADS)
Chu, Casey; Lithwick, Yoram; Antonini, Fabio
2016-01-01
We present a simple numerical algorithm to infer the gravitational potential of a galaxy from the observed positions and velocities of stars. The method is novel in that it works directly in Cartesian phase space and thus does not require any assumptions about the integrability of the Hamiltonian, as some other methods do. We have tested the algorithm on a two-dimensional logarithmic potential with good results and hope to be able to extend the algorithm to infer the Milky Way's potential from the full Gaia dataset.
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
Inferences of weekly cycles in summertime rainfall
NASA Astrophysics Data System (ADS)
Tuttle, John D.; Carbone, Richard E.
2011-10-01
In several continental regions a weekly cycle of air pollution aerosols has been observed. It is usually characterized by concentration minima on weekends (Saturday and Sunday) and maxima on weekdays (Tuesday-Friday). Several studies have associated varying aerosol concentrations with precipitation production and attempted to determine whether or not there is a corresponding weekly cycle of precipitation. Results to date have been mixed. Here we examine a 12 year national composited radar data set for evidence of weekly precipitation cycles during the warm season (June-August). Various statistical quantities are calculated and subjected to "bootstrap" testing in order to assess significance. In many parts of the United States, warm season precipitation is relatively infrequent, with a few extreme events contributing to a large percentage of the total 12 year rainfall. For this reason, the statistics are often difficult to interpret. The general area east of the Mississippi River and north of 37°N contains regions where 25%-50% daily rainfall increases are inferred for weekdays (Tuesday-Friday) relative to weekends. The statistics suggest that western Pennsylvania is the largest and most likely contiguous region to have a weekly cycle. Parts of northern Florida and southeastern coastal areas infer a reverse-phase cycle, with less rainfall during the week than on weekends. Spot checks of surface rain gauge data confirm the phase of these radar-observed anomalies in both Pennsylvania and Florida. While there are indications of a weekly cycle in other locations of the United States, the degree of confidence is considerably lower. There is a strong statistical inference of weekday rainfall maxima over a net 8% of the area examined, which is approximately twice the area of cities. Future examination of lofted aerosol content, related condensation/ice nuclei spectra, and knowledge of the convective dynamical regime are needed in order to assess how anthropogenic aerosols
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
Statistical Data Analysis in the Computer Age
NASA Astrophysics Data System (ADS)
Efron, Bradley; Tibshirani, Robert
1991-07-01
Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators. modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow the scientist to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability. This is possible because traditional methods of mathematical analysis are replaced by specially constructed computer algorithms. Mathematics has not disappeared from statistical theory. It is the main method for deciding which algorithms are correct and efficient tools for automating statistical inference.
Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.
Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik
2013-01-01
Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/. PMID:23410359
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
NASA Astrophysics Data System (ADS)
Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik
2013-01-01
Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.
Active inference, communication and hermeneutics.
Friston, Karl J; Frith, Christopher D
2015-07-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. PMID:25957007
Causal inference and developmental psychology.
Foster, E Michael
2010-11-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether the risk factor actually causes outcomes. Random assignment is not possible in many instances, and for that reason, psychologists must rely on observational studies. Such studies identify associations, and causal interpretation of such associations requires additional assumptions. Research in developmental psychology generally has relied on various forms of linear regression, but this methodology has limitations for causal inference. Fortunately, methodological developments in various fields are providing new tools for causal inference-tools that rely on more plausible assumptions. This article describes the limitations of regression for causal inference and describes how new tools might offer better causal inference. This discussion highlights the importance of properly identifying covariates to include (and exclude) from the analysis. This discussion considers the directed acyclic graph for use in accomplishing this task. With the proper covariates having been chosen, many of the available methods rely on the assumption of "ignorability." The article discusses the meaning of ignorability and considers alternatives to this assumption, such as instrumental variables estimation. Finally, the article considers the use of the tools discussed in the context of a specific research question, the effect of family structure on child development. PMID:20677855
Active inference, communication and hermeneutics☆
Friston, Karl J.; Frith, Christopher D.
2015-01-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others – during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions – both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then – in principle – they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. PMID:25957007
Managing Your Private and Public Data: Bringing Down Inference Attacks Against Your Privacy
NASA Astrophysics Data System (ADS)
Salamatian, Salman; Zhang, Amy; du Pin Calmon, Flavio; Bhamidipati, Sandilya; Fawaz, Nadia; Kveton, Branislav; Oliveira, Pedro; Taft, Nina
2015-10-01
We propose a practical methodology to protect a user's private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a privacy-preserving probabilistic mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address practical challenges encountered when applying this theoretical framework to real world data. On one hand, the design of optimal privacy-preserving mechanisms requires knowledge of the prior distribution linking private data and data to be released, which is often unavailable in practice. On the other hand, the optimization may become untractable and face scalability issues when data assumes values in large size alphabets, or is high dimensional. Our work makes three major contributions. First, we provide bounds on the impact on the privacy-utility tradeoff of a mismatched prior. Second, we show how to reduce the optimization size by introducing a quantization step, and how to generate privacy mappings under quantization. Third, we evaluate our method on three datasets, including a new dataset that we collected, showing correlations between political convictions and TV viewing habits. We demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.
NASA Astrophysics Data System (ADS)
Raiber, Matthias; White, Paul A.; Daughney, Christopher J.; Tschritter, Constanze; Davidson, Peter; Bainbridge, Sophie E.
2012-05-01
SummaryConcerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three-dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques (e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.
... cancer statistics across the world. U.S. Cancer Mortality Trends The best indicator of progress against cancer is ... the number of cancer survivors has increased. These trends show that progress is being made against the ...
Inference for interacting linear waves in ordered and random media
NASA Astrophysics Data System (ADS)
Tyagi, P.; Pagnani, A.; Antenucci, F.; Ibánez Berganza, M.; Leuzzi, L.
2015-05-01
A statistical inference method is developed and tested for pairwise interacting systems whose degrees of freedom are continuous angular variables, such as planar spins in magnetic systems or wave phases in optics and acoustics. We investigate systems with both deterministic and quenched disordered couplings on two extreme topologies: complete and sparse graphs. To match further applications in optics also complex couplings and external fields are considered and general inference formulas are derived for real and imaginary parts of Hermitian coupling matrices from real and imaginary parts of complex correlation functions. The whole procedure is, eventually, tested on numerically generated correlation functions and local magnetizations by means of Monte Carlo simulations.
NASA Astrophysics Data System (ADS)
Hermann, Claudine
Statistical Physics bridges the properties of a macroscopic system and the microscopic behavior of its constituting particles, otherwise impossible due to the giant magnitude of Avogadro's number. Numerous systems of today's key technologies - such as semiconductors or lasers - are macroscopic quantum objects; only statistical physics allows for understanding their fundamentals. Therefore, this graduate text also focuses on particular applications such as the properties of electrons in solids with applications, and radiation thermodynamics and the greenhouse effect.
Receptive Field Inference with Localized Priors
Park, Mijung; Pillow, Jonathan W.
2011-01-01
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets. PMID:22046110
NIFTY: A versatile Python library for signal inference
NASA Astrophysics Data System (ADS)
Selig, Marco; Bell, Michael R.; Junklewitz, Henrik; Oppermann, Niels; Reinecke, Martin; Greiner, Maksim; Pachajoa, Carlos; Enßlin, Torsten A.
2013-02-01
NIFTY (Numerical Information Field TheorY) is a versatile library enables the development of signal inference algorithms that operate regardless of the underlying spatial grid and its resolution. Its object-oriented framework is written in Python, although it accesses libraries written in Cython, C++, and C for efficiency. NIFTY offers a toolkit that abstracts discretized representations of continuous spaces, fields in these spaces, and operators acting on fields into classes. Thereby, the correct normalization of operations on fields is taken care of automatically. This allows for an abstract formulation and programming of inference algorithms, including those derived within information field theory. Thus, NIFTY permits rapid prototyping of algorithms in 1D and then the application of the developed code in higher-dimensional settings of real world problems. NIFTY operates on point sets, n-dimensional regular grids, spherical spaces, their harmonic counterparts, and product spaces constructed as combinations of those.
Causal inference from observational data.
Listl, Stefan; Jürges, Hendrik; Watt, Richard G
2016-10-01
Randomized controlled trials have long been considered the 'gold standard' for causal inference in clinical research. In the absence of randomized experiments, identification of reliable intervention points to improve oral health is often perceived as a challenge. But other fields of science, such as social science, have always been challenged by ethical constraints to conducting randomized controlled trials. Methods have been established to make causal inference using observational data, and these methods are becoming increasingly relevant in clinical medicine, health policy and public health research. This study provides an overview of state-of-the-art methods specifically designed for causal inference in observational data, including difference-in-differences (DiD) analyses, instrumental variables (IV), regression discontinuity designs (RDD) and fixed-effects panel data analysis. The described methods may be particularly useful in dental research, not least because of the increasing availability of routinely collected administrative data and electronic health records ('big data'). PMID:27111146
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
NASA Astrophysics Data System (ADS)
Goodman, J. W.
This book is based on the thesis that some training in the area of statistical optics should be included as a standard part of any advanced optics curriculum. Random variables are discussed, taking into account definitions of probability and random variables, distribution functions and density functions, an extension to two or more random variables, statistical averages, transformations of random variables, sums of real random variables, Gaussian random variables, complex-valued random variables, and random phasor sums. Other subjects examined are related to random processes, some first-order properties of light waves, the coherence of optical waves, some problems involving high-order coherence, effects of partial coherence on imaging systems, imaging in the presence of randomly inhomogeneous media, and fundamental limits in photoelectric detection of light. Attention is given to deterministic versus statistical phenomena and models, the Fourier transform, and the fourth-order moment of the spectrum of a detected speckle image.
Eight challenges in phylodynamic inference
Frost, Simon D.W.; Pybus, Oliver G.; Gog, Julia R.; Viboud, Cecile; Bonhoeffer, Sebastian; Bedford, Trevor
2015-01-01
The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data. PMID:25843391
Inferring biotic interactions from proxies.
Morales-Castilla, Ignacio; Matias, Miguel G; Gravel, Dominique; Araújo, Miguel B
2015-06-01
Inferring biotic interactions from functional, phylogenetic and geographical proxies remains one great challenge in ecology. We propose a conceptual framework to infer the backbone of biotic interaction networks within regional species pools. First, interacting groups are identified to order links and remove forbidden interactions between species. Second, additional links are removed by examination of the geographical context in which species co-occur. Third, hypotheses are proposed to establish interaction probabilities between species. We illustrate the framework using published food-webs in terrestrial and marine systems. We conclude that preliminary descriptions of the web of life can be made by careful integration of data with theory. PMID:25922148
Linking numbers, spin, and statistics of solitons
NASA Technical Reports Server (NTRS)
Wilczek, F.; Zee, A.
1983-01-01
The spin and statistics of solitons in the (2 + 1)- and (3 + 1)-dimensional nonlinear sigma models is considered. For the (2 + 1)-dimensional case, there is the possibility of fractional spin and exotic statistics; for 3 + 1 dimensions, the usual spin-statistics relation is demonstrated. The linking-number interpretation of the Hopf invariant and the use of suspension considerably simplify the analysis.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Design-based and model-based inference in surveys of freshwater mollusks
Dorazio, R.M.
1999-01-01
Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.
ERIC Educational Resources Information Center
Chicot, Katie; Holmes, Hilary
2012-01-01
The use, and misuse, of statistics is commonplace, yet in the printed format data representations can be either over simplified, supposedly for impact, or so complex as to lead to boredom, supposedly for completeness and accuracy. In this article the link to the video clip shows how dynamic visual representations can enliven and enhance the…
ERIC Educational Resources Information Center
Catley, Alan
2007-01-01
Following the announcement last year that there will be no more math coursework assessment at General Certificate of Secondary Education (GCSE), teachers will in the future be able to devote more time to preparing learners for formal examinations. One of the key things that the author has learned when teaching statistics is that it makes for far…
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates
Eklund, Anders; Nichols, Thomas E.; Knutsson, Hans
2016-01-01
The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging. PMID:27357684
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.
Eklund, Anders; Nichols, Thomas E; Knutsson, Hans
2016-07-12
The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging. PMID:27357684
Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.
2015-01-01
This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959
Sparse and Compositionally Robust Inference of Microbial Ecological Networks
Kurtz, Zachary D.; Müller, Christian L.; Miraldi, Emily R.; Littman, Dan R.; Blaser, Martin J.; Bonneau, Richard A.
2015-01-01
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC
Sparse and compositionally robust inference of microbial ecological networks.
Kurtz, Zachary D; Müller, Christian L; Miraldi, Emily R; Littman, Dan R; Blaser, Martin J; Bonneau, Richard A
2015-05-01
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Science Shorts: Observation versus Inference
ERIC Educational Resources Information Center
Leager, Craig R.
2008-01-01
When you observe something, how do you know for sure what you are seeing, feeling, smelling, or hearing? Asking students to think critically about their encounters with the natural world will help to strengthen their understanding and application of the science-process skills of observation and inference. In the following lesson, students make…
Sample Size and Correlational Inference
ERIC Educational Resources Information Center
Anderson, Richard B.; Doherty, Michael E.; Friedrich, Jeff C.
2008-01-01
In 4 studies, the authors examined the hypothesis that the structure of the informational environment makes small samples more informative than large ones for drawing inferences about population correlations. The specific purpose of the studies was to test predictions arising from the signal detection simulations of R. B. Anderson, M. E. Doherty,…
Word Learning as Bayesian Inference
ERIC Educational Resources Information Center
Xu, Fei; Tenenbaum, Joshua B.
2007-01-01
The authors present a Bayesian framework for understanding how adults and children learn the meanings of words. The theory explains how learners can generalize meaningfully from just one or a few positive examples of a novel word's referents, by making rational inductive inferences that integrate prior knowledge about plausible word meanings with…
The mechanisms of temporal inference
NASA Technical Reports Server (NTRS)
Fox, B. R.; Green, S. R.
1987-01-01
The properties of a temporal language are determined by its constituent elements: the temporal objects which it can represent, the attributes of those objects, the relationships between them, the axioms which define the default relationships, and the rules which define the statements that can be formulated. The methods of inference which can be applied to a temporal language are derived in part from a small number of axioms which define the meaning of equality and order and how those relationships can be propagated. More complex inferences involve detailed analysis of the stated relationships. Perhaps the most challenging area of temporal inference is reasoning over disjunctive temporal constraints. Simple forms of disjunction do not sufficiently increase the expressive power of a language while unrestricted use of disjunction makes the analysis NP-hard. In many cases a set of disjunctive constraints can be converted to disjunctive normal form and familiar methods of inference can be applied to the conjunctive sub-expressions. This process itself is NP-hard but it is made more tractable by careful expansion of a tree-structured search space.
Perceptual Inference and Autistic Traits
ERIC Educational Resources Information Center
Skewes, Joshua C; Jegindø, Else-Marie; Gebauer, Line
2015-01-01
Autistic people are better at perceiving details. Major theories explain this in terms of bottom-up sensory mechanisms or in terms of top-down cognitive biases. Recently, it has become possible to link these theories within a common framework. This framework assumes that perception is implicit neural inference, combining sensory evidence with…
Improving Explanatory Inferences from Assessments
ERIC Educational Resources Information Center
Diakow, Ronli Phyllis
2013-01-01
This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…
Measuring statistical evidence using relative belief.
Evans, Michael
2016-01-01
A fundamental concern of a theory of statistical inference is how one should measure statistical evidence. Certainly the words "statistical evidence," or perhaps just "evidence," are much used in statistical contexts. It is fair to say, however, that the precise characterization of this concept is somewhat elusive. Our goal here is to provide a definition of how to measure statistical evidence for any particular statistical problem. Since evidence is what causes beliefs to change, it is proposed to measure evidence by the amount beliefs change from a priori to a posteriori. As such, our definition involves prior beliefs and this raises issues of subjectivity versus objectivity in statistical analyses. This is dealt with through a principle requiring the falsifiability of any ingredients to a statistical analysis. These concerns lead to checking for prior-data conflict and measuring the a priori bias in a prior. PMID:26925207
Measuring statistical evidence using relative belief
Evans, Michael
2016-01-01
A fundamental concern of a theory of statistical inference is how one should measure statistical evidence. Certainly the words “statistical evidence,” or perhaps just “evidence,” are much used in statistical contexts. It is fair to say, however, that the precise characterization of this concept is somewhat elusive. Our goal here is to provide a definition of how to measure statistical evidence for any particular statistical problem. Since evidence is what causes beliefs to change, it is proposed to measure evidence by the amount beliefs change from a priori to a posteriori. As such, our definition involves prior beliefs and this raises issues of subjectivity versus objectivity in statistical analyses. This is dealt with through a principle requiring the falsifiability of any ingredients to a statistical analysis. These concerns lead to checking for prior-data conflict and measuring the a priori bias in a prior. PMID:26925207
NASA Technical Reports Server (NTRS)
Lee, Mun Wai
2015-01-01
Crew exercise is important during long-duration space flight not only for maintaining health and fitness but also for preventing adverse health problems, such as losses in muscle strength and bone density. Monitoring crew exercise via motion capture and kinematic analysis aids understanding of the effects of microgravity on exercise and helps ensure that exercise prescriptions are effective. Intelligent Automation, Inc., has developed ESPRIT to monitor exercise activities, detect body markers, extract image features, and recover three-dimensional (3D) kinematic body poses. The system relies on prior knowledge and modeling of the human body and on advanced statistical inference techniques to achieve robust and accurate motion capture. In Phase I, the company demonstrated motion capture of several exercises, including walking, curling, and dead lifting. Phase II efforts focused on enhancing algorithms and delivering an ESPRIT prototype for testing and demonstration.
Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models
Mehta, Pankaj; Schwab, David J.; Sengupta, Anirvan M.
2011-01-01
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the “inverse” statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it. PMID:22851788
A Statistical Approach to Identifying Compact Objects in X-ray Binaries
NASA Astrophysics Data System (ADS)
Vrtilek, Saeqa D.
2013-04-01
A standard approach towards statistical inferences in astronomy has been the application of Principal Components Analysis (PCA) to reduce dimensionality. However, for non-linear distributions this is not always an effective approach. A non-linear technique called ``diffusion maps" (Freema \\eta 2009; Richard \\eta 2009; Lee \\& Waterman 2010), a robust eigenmode-based framework, allows retention of the full ``connectivity" of the data points. Through this approach we define the highly non-linear geometry of X-ray binaries in a color-color-intensity diagram in an efficient and statistically sound manner providing a broadly applicable means of distinguishing between black holes and neutron stars in Galactic X-ray binaries.
Inferring processes underlying B-cell repertoire diversity.
Elhanati, Yuval; Sethna, Zachary; Marcou, Quentin; Callan, Curtis G; Mora, Thierry; Walczak, Aleksandra M
2015-09-01
We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site. PMID:26194757
Inferring processes underlying B-cell repertoire diversity
Elhanati, Yuval; Sethna, Zachary; Marcou, Quentin; Callan, Curtis G.; Mora, Thierry; Walczak, Aleksandra M.
2015-01-01
We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site. PMID:26194757
Relationship inference based on DNA mixtures.
Kaur, Navreet; Bouzga, Mariam M; Dørum, Guro; Egeland, Thore
2016-03-01
Today, there exists a number of tools for solving kinship cases. But what happens when information comes from a mixture? DNA mixtures are in general rarely seen in kinship cases, but in a case presented to the Norwegian Institute of Public Health, sample DNA was obtained after a rape case that resulted in an unwanted pregnancy and abortion. The only available DNA from the fetus came in form of a mixture with the mother, and it was of interest to find the father of the fetus. The mother (the victim), however, refused to give her reference data and so commonly used methods for paternity testing were no longer applicable. As this case illustrates, kinship cases involving mixtures and missing reference profiles do occur and make the use of existing methods rather inconvenient. We here present statistical methods that may handle general relationship inference based on DNA mixtures. The basic idea is that likelihood calculations for mixtures can be decomposed into a series of kinship problems. This formulation of the problem facilitates the use of kinship software. We present the freely available R package relMix which extends on the R version of Familias. Complicating factors like mutations, silent alleles, and θ-correction are then easily handled for quite general family relationships, and are included in the statistical methods we develop in this paper. The methods and their implementations are exemplified on the data from the rape case. PMID:26541994
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning. PMID:19159147
System Support for Forensic Inference
NASA Astrophysics Data System (ADS)
Gehani, Ashish; Kirchner, Florent; Shankar, Natarajan
Digital evidence is playing an increasingly important role in prosecuting crimes. The reasons are manifold: financially lucrative targets are now connected online, systems are so complex that vulnerabilities abound and strong digital identities are being adopted, making audit trails more useful. If the discoveries of forensic analysts are to hold up to scrutiny in court, they must meet the standard for scientific evidence. Software systems are currently developed without consideration of this fact. This paper argues for the development of a formal framework for constructing “digital artifacts” that can serve as proxies for physical evidence; a system so imbued would facilitate sound digital forensic inference. A case study involving a filesystem augmentation that provides transparent support for forensic inference is described.
1986-01-01
Official population data for the USSR are presented for 1985 and 1986. Part 1 (pp. 65-72) contains data on capitals of union republics and cities with over one million inhabitants, including population estimates for 1986 and vital statistics for 1985. Part 2 (p. 72) presents population estimates by sex and union republic, 1986. Part 3 (pp. 73-6) presents data on population growth, including birth, death, and natural increase rates, 1984-1985; seasonal distribution of births and deaths; birth order; age-specific birth rates in urban and rural areas and by union republic; marriages; age at marriage; and divorces. PMID:12178831
Denoising and dimensionality reduction of genomic data
NASA Astrophysics Data System (ADS)
Capobianco, Enrico
2005-05-01
Genomics represents a challenging research field for many quantitative scientists, and recently a vast variety of statistical techniques and machine learning algorithms have been proposed and inspired by cross-disciplinary work with computational and systems biologists. In genomic applications, the researcher deals with noisy and complex high-dimensional feature spaces; a wealth of genes whose expression levels are experimentally measured, can often be observed for just a few time points, thus limiting the available samples. This unbalanced combination suggests that it might be hard for standard statistical inference techniques to come up with good general solutions, likewise for machine learning algorithms to avoid heavy computational work. Thus, one naturally turns to two major aspects of the problem: sparsity and intrinsic dimensionality. These two aspects are studied in this paper, where for both denoising and dimensionality reduction, a very efficient technique, i.e., Independent Component Analysis, is used. The numerical results are very promising, and lead to a very good quality of gene feature selection, due to the signal separation power enabled by the decomposition technique. We investigate how the use of replicates can improve these results, and deal with noise through a stabilization strategy which combines the estimated components and extracts the most informative biological information from them. Exploiting the inherent level of sparsity is a key issue in genetic regulatory networks, where the connectivity matrix needs to account for the real links among genes and discard many redundancies. Most experimental evidence suggests that real gene-gene connections represent indeed a subset of what is usually mapped onto either a huge gene vector or a typically dense and highly structured network. Inferring gene network connectivity from the expression levels represents a challenging inverse problem that is at present stimulating key research in biomedical
Data-free inference of the joint distribution of uncertain model parameters.
Marzouk, Youssef M.; Adalsteinsson, Helgi; Berry, Robert Dan; Debusschere, Bert J.; Najm, Habib N.
2010-05-01
It is known that, in general, the correlation structure in the joint distribution of model parameters is critical to the uncertainty analysis of that model. Very often, however, studies in the literature only report nominal values for parameters inferred from data, along with confidence intervals for these parameters, but no details on the correlation or full joint distribution of these parameters. When neither posterior nor data are available, but only summary statistics such as nominal values and confidence intervals, a joint PDF must be chosen. Given the summary statistics it may not be reasonable nor necessary to assume the parameters are independent random variables. We demonstrate, using a Bayesian inference procedure, how to construct a posterior density for the parameters exhibiting self consistent correlations, in the absence of data, given (1) the fit-model, (2) nominal parameter values, (3) bounds on the parameters, and (4) a postulated statistical model, around the fit-model, for the missing data. Our approach ensures external Bayesian updating while marginalizing over possible data realizations. We then address the matching of given parameter bounds through the choice of hyperparameters, which are introduced in postulating the statistical model, but are not given nominal values. We discuss some possible approaches, including (1) inferring them in a separate Bayesian inference loop and (2) optimization. We also perform an empirical evaluation of the algorithm showing the posterior obtained with this data free inference compares well with the true posterior obtained from inference against the full data set.
Self-enforcing Private Inference Control
NASA Astrophysics Data System (ADS)
Yang, Yanjiang; Li, Yingjiu; Weng, Jian; Zhou, Jianying; Bao, Feng
Private inference control enables simultaneous enforcement of inference control and protection of users' query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users.
NASA Astrophysics Data System (ADS)
Pesenson, Meyer; Pesenson, I. Z.; McCollum, B.
2009-05-01
The complexity of multitemporal/multispectral astronomical data sets together with the approaching petascale of such datasets and large astronomical surveys require automated or semi-automated methods for knowledge discovery. Traditional statistical methods of analysis may break down not only because of the amount of data, but mostly because of the increase of the dimensionality of data. Image fusion (combining information from multiple sensors in order to create a composite enhanced image) and dimension reduction (finding lower-dimensional representation of high-dimensional data) are effective approaches to "the curse of dimensionality,” thus facilitating automated feature selection, classification and data segmentation. Dimension reduction methods greatly increase computational efficiency of machine learning algorithms, improve statistical inference and together with image fusion enable effective scientific visualization (as opposed to mere illustrative visualization). The main approach of this work utilizes recent advances in multidimensional image processing, as well as representation of essential structure of a data set in terms of its fundamental eigenfunctions, which are used as an orthonormal basis for the data visualization and analysis. We consider multidimensional data sets and images as manifolds or combinatorial graphs and construct variational splines that minimize certain Sobolev norms. These splines allow us to reconstruct the eigenfunctions of the combinatorial Laplace operator by using only a small portion of the graph. We use the first two or three eigenfunctions for embedding large data sets into two- or three-dimensional Euclidean space. Such reduced data sets allow efficient data organization, retrieval, analysis and visualization. We demonstrate applications of the algorithms to test cases from the Spitzer Space Telescope. This work was carried out with funding from the National Geospatial-Intelligence Agency University Research Initiative
NASA Astrophysics Data System (ADS)
Dettmer, Jan; Molnar, Sheri; Steininger, Gavin; Dosso, Stan E.; Cassidy, John F.
2012-02-01
This paper applies a general trans-dimensional Bayesian inference methodology and hierarchical autoregressive data-error models to the inversion of microtremor array dispersion data for shear wave velocity (vs) structure. This approach accounts for the limited knowledge of the optimal earth model parametrization (e.g. the number of layers in the vs profile) and of the data-error statistics in the resulting vs parameter uncertainty estimates. The assumed earth model parametrization influences estimates of parameter values and uncertainties due to different parametrizations leading to different ranges of data predictions. The support of the data for a particular model is often non-unique and several parametrizations may be supported. A trans-dimensional formulation accounts for this non-uniqueness by including a model-indexing parameter as an unknown so that groups of models (identified by the indexing parameter) are considered in the results. The earth model is parametrized in terms of a partition model with interfaces given over a depth-range of interest. In this work, the number of interfaces (layers) in the partition model represents the trans-dimensional model indexing. In addition, serial data-error correlations are addressed by augmenting the geophysical forward model with a hierarchical autoregressive error model that can account for a wide range of error processes with a small number of parameters. Hence, the limited knowledge about the true statistical distribution of data errors is also accounted for in the earth model parameter estimates, resulting in more realistic uncertainties and parameter values. Hierarchical autoregressive error models do not rely on point estimates of the model vector to estimate data-error statistics, and have no requirement for computing the inverse or determinant of a data-error covariance matrix. This approach is particularly useful for trans-dimensional inverse problems, as point estimates may not be representative of the
Exponential family models and statistical genetics.
Palmgren, J
2000-02-01
This article describes the evolution of applied exponential family models, starting at 1972, the year of publication of the seminal papers on generalized linear models and on Cox regression, and leading to multivariate (i) marginal models and inference based on estimating equations and (ii) random effects models and Bayesian simulation-based posterior inference. By referring to recent work in genetic epidemiology, on semiparametric methods for linkage analysis and on transmission/disequilibrium tests for haplotype transmission this paper illustrates the potential for the recent advances in applied probability and statistics to contribute to new and unified tools for statistical genetics. Finally, it is emphasized that there is a need for well-defined postgraduate education paths in medical statistics in the year 2000 and thereafter. PMID:10826159
Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting
Solís-Lemus, Claudia; Ané, Cécile
2016-01-01
Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations. PMID:26950302
Quantum Inference on Bayesian Networks
NASA Astrophysics Data System (ADS)
Yoder, Theodore; Low, Guang Hao; Chuang, Isaac
2014-03-01
Because quantum physics is naturally probabilistic, it seems reasonable to expect physical systems to describe probabilities and their evolution in a natural fashion. Here, we use quantum computation to speedup sampling from a graphical probability model, the Bayesian network. A specialization of this sampling problem is approximate Bayesian inference, where the distribution on query variables is sampled given the values e of evidence variables. Inference is a key part of modern machine learning and artificial intelligence tasks, but is known to be NP-hard. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time (nmP(e) - 1 / 2) , depending critically on P(e) , the probability the evidence might occur in the first place. However, by implementing a quantum version of rejection sampling, we obtain a square-root speedup, taking (n2m P(e) -1/2) time per sample. The speedup is the result of amplitude amplification, which is proving to be broadly applicable in sampling and machine learning tasks. In particular, we provide an explicit and efficient circuit construction that implements the algorithm without the need for oracle access.
Causal inference with a quantitative exposure.
Zhang, Zhiwei; Zhou, Jie; Cao, Weihua; Zhang, Jun
2016-02-01
The current statistical literature on causal inference is mostly concerned with binary or categorical exposures, even though exposures of a quantitative nature are frequently encountered in epidemiologic research. In this article, we review the available methods for estimating the dose-response curve for a quantitative exposure, which include ordinary regression based on an outcome regression model, inverse propensity weighting and stratification based on a propensity function model, and an augmented inverse propensity weighting method that is doubly robust with respect to the two models. We note that an outcome regression model often imposes an implicit constraint on the dose-response curve, and propose a flexible modeling strategy that avoids constraining the dose-response curve. We also propose two new methods: a weighted regression method that combines ordinary regression with inverse propensity weighting and a stratified regression method that combines ordinary regression with stratification. The proposed methods are similar to the augmented inverse propensity weighting method in the sense of double robustness, but easier to implement and more generally applicable. The methods are illustrated with an obstetric example and compared in simulation studies. PMID:22729475
Virtual reality and consciousness inference in dreaming.
Hobson, J Allan; Hong, Charles C-H; Friston, Karl J
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research. PMID:25346710
Bell's theorem, inference, and quantum transactions
NASA Astrophysics Data System (ADS)
Garrett, A. J. M.
1990-04-01
Bell's theorem is expounded as an analysis in Bayesian inference. Assuming the result of a spin measurement on a particle is governed by a causal variable internal (hidden, “local”) to the particle, one learns about it by making a spin measurement; thence about the internal variable of a second particle correlated with the first; and from there predicts the probabilistic result of spin measurements on the second particle. Such predictions are violated by experiment: locality/causality fails. The statistical nature of the observations rules out signalling; acausal, superluminal, or otherwise. Quantum mechanics is irrelevant to this reasoning, although its correct predictions of experiment imply that it has a nonlocal/acausal interpretation. Cramer's new transactional interpretation, which incorporates this feature by adapting the Wheeler-Feynman idea of advanced and retarded processes to the quantum laws, is advocated. It leads to an invaluable way of envisaging quantum processes. The usual paradoxes melt before this, and one, the “delayed choice” experiment, is chosen for detailed inspection. Nonlocality implies practical difficulties in influencing hidden variables, which provides a very plausible explanation for why they have not yet been found; from this standpoint, Bell's theorem reinforces arguments in favor of hidden variables.
Functional network inference of the suprachiasmatic nucleus.
Abel, John H; Meeker, Kirsten; Granados-Fuentes, Daniel; St John, Peter C; Wang, Thomas J; Bales, Benjamin B; Doyle, Francis J; Herzog, Erik D; Petzold, Linda R
2016-04-19
In the mammalian suprachiasmatic nucleus (SCN), noisy cellular oscillators communicate within a neuronal network to generate precise system-wide circadian rhythms. Although the intracellular genetic oscillator and intercellular biochemical coupling mechanisms have been examined previously, the network topology driving synchronization of the SCN has not been elucidated. This network has been particularly challenging to probe, due to its oscillatory components and slow coupling timescale. In this work, we investigated the SCN network at a single-cell resolution through a chemically induced desynchronization. We then inferred functional connections in the SCN by applying the maximal information coefficient statistic to bioluminescence reporter data from individual neurons while they resynchronized their circadian cycling. Our results demonstrate that the functional network of circadian cells associated with resynchronization has small-world characteristics, with a node degree distribution that is exponential. We show that hubs of this small-world network are preferentially located in the central SCN, with sparsely connected shells surrounding these cores. Finally, we used two computational models of circadian neurons to validate our predictions of network structure. PMID:27044085
Virtual reality and consciousness inference in dreaming
Hobson, J. Allan; Hong, Charles C.-H.; Friston, Karl J.
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that – through experience-dependent plasticity – becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep – and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain’s generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis – evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research. PMID:25346710
Statistical Estimation of Orbital Debris Populations with a Spectrum of Object Size
NASA Technical Reports Server (NTRS)
Xu, Y. -l; Horstman, M.; Krisko, P. H.; Liou, J. -C; Matney, M.; Stansbery, E. G.; Stokely, C. L.; Whitlock, D.
2008-01-01
Orbital debris is a real concern for the safe operations of satellites. In general, the hazard of debris impact is a function of the size and spatial distributions of the debris populations. To describe and characterize the debris environment as reliably as possible, the current NASA Orbital Debris Engineering Model (ORDEM2000) is being upgraded to a new version based on new and better quality data. The data-driven ORDEM model covers a wide range of object sizes from 10 microns to greater than 1 meter. This paper reviews the statistical process for the estimation of the debris populations in the new ORDEM upgrade, and discusses the representation of large-size (greater than or equal to 1 m and greater than or equal to 10 cm) populations by SSN catalog objects and the validation of the statistical approach. Also, it presents results for the populations with sizes of greater than or equal to 3.3 cm, greater than or equal to 1 cm, greater than or equal to 100 micrometers, and greater than or equal to 10 micrometers. The orbital debris populations used in the new version of ORDEM are inferred from data based upon appropriate reference (or benchmark) populations instead of the binning of the multi-dimensional orbital-element space. This paper describes all of the major steps used in the population-inference procedure for each size-range. Detailed discussions on data analysis, parameter definition, the correlation between parameters and data, and uncertainty assessment are included.
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Statistical Methods in Cosmology
NASA Astrophysics Data System (ADS)
Verde, L.
2010-03-01
The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.
Bayesian Inference of Tumor Hypoxia
NASA Astrophysics Data System (ADS)
Gunawan, R.; Tenti, G.; Sivaloganathan, S.
2009-12-01
Tumor hypoxia is a state of oxygen deprivation in tumors. It has been associated with aggressive tumor phenotypes and with increased resistance to conventional cancer therapies. In this study, we report on the application of Bayesian sequential analysis in estimating the most probable value of tumor hypoxia quantification based on immunohistochemical assays of a biomarker. The `gold standard' of tumor hypoxia assessment is a direct measurement of pO2 in vivo by the Eppendorf polarographic electrode, which is an invasive technique restricted to accessible sites and living tissues. An attractive alternative is immunohistochemical staining to detect proteins expressed by cells during hypoxia. Carbonic anhydrase IX (CAIX) is an enzyme expressed on the cell membrane during hypoxia to balance the immediate extracellular microenvironment. CAIX is widely regarded as a surrogate marker of chronic hypoxia in various cancers. The study was conducted with two different experimental procedures. The first data set was a group of three patients with invasive cervical carcinomas, from which five biopsies were obtained. Each of the biopsies was fully sectioned and from each section, the proportion of CAIX-positive cells was estimated. Measurements were made by image analysis of multiple deep sections cut through these biopsies, labeled for CAIX using both immunofluorescence and immunohistochemical techniques [1]. The second data set was a group of 24 patients, also with invasive cervical carcinomas, from which two biopsies were obtained. Bayesian parameter estimation was applied to obtain a reliable inference about the proportion of CAIX-positive cells within the carcinomas, based on the available biopsies. From the first data set, two to three biopsies were found to be sufficient to infer the overall CAIX percentage in the simple form: best estimate±uncertainty. The second data-set led to a similar result in 70% of the cases. In the remaining cases Bayes' theorem warned us
A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations
NASA Astrophysics Data System (ADS)
He, Hao; Varshney, Pramod K.
2016-04-01
We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.
Simulated tornado debris tracks: implications for inferring corner flow structure
NASA Astrophysics Data System (ADS)
Zimmerman, Michael; Lewellen, David
2011-11-01
A large collection of three-dimensional large eddy simulations of tornadoes with fine debris have been recently been performed as part of a longstanding effort at West Virginia University to understand tornado corner flow structure and dynamics. Debris removal and deposition is accounted for at the surface, in effect simulating formation of tornado surface marks. Physical origins and properties of the most prominent marks will be presented, and the possibility of inferring tornado corner flow structure from real marks in the field will be discussed. This material is based upon work supported by the National Science Foundation under Grants No. 0635681 and AGS-1013154.
Teaching Classical Statistical Mechanics: A Simulation Approach.
ERIC Educational Resources Information Center
Sauer, G.
1981-01-01
Describes a one-dimensional model for an ideal gas to study development of disordered motion in Newtonian mechanics. A Monte Carlo procedure for simulation of the statistical ensemble of an ideal gas with fixed total energy is developed. Compares both approaches for a pseudoexperimental foundation of statistical mechanics. (Author/JN)
Gene network inference by fusing data from diverse distributions
Žitnik, Marinka; Zupan, Blaž
2015-01-01
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact: blaz.zupan@fri.uni-lj.si Supplementary information: Supplementary information is available at Bioinformatics online. PMID:26072487
Infering Networks From Collective Dynamics
NASA Astrophysics Data System (ADS)
Timme, Marc
How can we infer direct physical interactions between pairs of units from only knowing the units' time series? Here we present a dynamical systems' view on collective network dynamics, and propose the concept of a dynamics' space to reveal interaction networks from time series. We present two examples: one, where the time series stem from standard ordinary differential equations, and a second, more abstract, where the time series exhibits only partial information about the units' states. We apply the latter to neural circuit dynamics where the observables are spike timing data, i.e. only a discrete, state-dependent outputs of the neurons. These results may help revealing network structure for systems where direct access to dynamics is simpler than to connectivity, cf.. This is work with Jose Casadiego, Srinivas Gorur Shandilya, Mor Nitzan, Hauke Haehne and Dimitra Maoutsa. Supported by Grants of the BMBF (Future Compliant Power Grids - CoNDyNet) and by the Max Planck Society to MT.
Inferred properties of stellar granulation
Gray, D.F.; Toner, C.G.
1985-06-01
Apparent characteristics of stellar granulation in F and G main-sequence stars are inferred directly from observed spectral-line asymmetries and from comparisons of numerical simulations with the observations: (1) the apparent granulation velocity increases with effective temperature, (2) the dispersion of granule velocities about their mean velocity of rise increases with the apparent granulation velocity, (3) the mean velocity of rise of granules must be less than the total line broadening, (4) the apparent velocity difference between granules and dark lanes corresponds to the granulation velocity deduced from stellar line bisectors, (5) the dark lanes show velocities of fall approximately twice as large as the granule rise velocities, (6) the light contributed to the stellar flux by the granules is four to ten times more than the light from the dark lanes. Stellar rotation is predicted to produce distortions in the line bisectors which may give information on the absolute velocity displacements of the line bisectors. 37 references.
Structural inference for uncertain networks
NASA Astrophysics Data System (ADS)
Martin, Travis; Ball, Brian; Newman, M. E. J.
2016-01-01
In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network.
Statistical Approach to Protein Quantification*
Gerster, Sarah; Kwon, Taejoon; Ludwig, Christina; Matondo, Mariette; Vogel, Christine; Marcotte, Edward M.; Aebersold, Ruedi; Bühlmann, Peter
2014-01-01
A major goal in proteomics is the comprehensive and accurate description of a proteome. This task includes not only the identification of proteins in a sample, but also the accurate quantification of their abundance. Although mass spectrometry typically provides information on peptide identity and abundance in a sample, it does not directly measure the concentration of the corresponding proteins. Specifically, most mass-spectrometry-based approaches (e.g. shotgun proteomics or selected reaction monitoring) allow one to quantify peptides using chromatographic peak intensities or spectral counting information. Ultimately, based on these measurements, one wants to infer the concentrations of the corresponding proteins. Inferring properties of the proteins based on experimental peptide evidence is often a complex problem because of the ambiguity of peptide assignments and different chemical properties of the peptides that affect the observed concentrations. We present SCAMPI, a novel generic and statistically sound framework for computing protein abundance scores based on quantified peptides. In contrast to most previous approaches, our model explicitly includes information from shared peptides to improve protein quantitation, especially in eukaryotes with many homologous sequences. The model accounts for uncertainty in the input data, leading to statistical prediction intervals for the protein scores. Furthermore, peptides with extreme abundances can be reassessed and classified as either regular data points or actual outliers. We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up mass spectrometry. PMID:24255132
Transdimensional inference in the geosciences.
Sambridge, M; Bodin, T; Gallagher, K; Tkalcic, H
2013-02-13
Seismologists construct images of the Earth's interior structure using observations, derived from seismograms, collected at the surface. A common approach to such inverse problems is to build a single 'best' Earth model, in some sense. This is despite the fact that the observations by themselves often do not require, or even allow, a single best-fit Earth model to exist. Interpretation of optimal models can be fraught with difficulties, particularly when formal uncertainty estimates become heavily dependent on the regularization imposed. Similar issues occur across the physical sciences with model construction in ill-posed problems. An alternative approach is to embrace the non-uniqueness directly and employ an inference process based on parameter space sampling. Instead of seeking a best model within an optimization framework, one seeks an ensemble of solutions and derives properties of that ensemble for inspection. While this idea has itself been employed for more than 30 years, it is now receiving increasing attention in the geosciences. Recently, it has been shown that transdimensional and hierarchical sampling methods have some considerable benefits for problems involving multiple parameter types, uncertain data errors and/or uncertain model parametrizations, as are common in seismology. Rather than being forced to make decisions on parametrization, the level of data noise and the weights between data types in advance, as is often the case in an optimization framework, the choice can be informed by the data themselves. Despite the relatively high computational burden involved, the number of areas where sampling methods are now feasible is growing rapidly. The intention of this article is to introduce concepts of transdimensional inference to a general readership and illustrate with particular seismological examples. A growing body of references provide necessary detail. PMID:23277604
Bayesian inference of local geomagnetic secular variation curves: application to archaeomagnetism
NASA Astrophysics Data System (ADS)
Lanos, Philippe
2014-05-01
The errors that occur at different stages of the archaeomagnetic calibration process are combined using a Bayesian hierarchical modelling. The archaeomagnetic data obtained from archaeological structures such as hearths, kilns or sets of bricks and tiles, exhibit considerable experimental errors and are generally more or less well dated by archaeological context, history or chronometric methods (14C, TL, dendrochronology, etc.). They can also be associated with stratigraphic observations which provide prior relative chronological information. The modelling we propose allows all these observations and errors to be linked together thanks to appropriate prior probability densities. The model also includes penalized cubic splines for estimating the univariate, spherical or three-dimensional curves for the secular variation of the geomagnetic field (inclination, declination, intensity) over time at a local place. The mean smooth curve we obtain, with its posterior Bayesian envelop provides an adaptation to the effects of variability in the density of reference points over time. Moreover, the hierarchical modelling also allows an efficient way to penalize outliers automatically. With this new posterior estimate of the curve, the Bayesian statistical framework then allows to estimate the calendar dates of undated archaeological features (such as kilns) based on one, two or three geomagnetic parameters (inclination, declination and/or intensity). Date estimates are presented in the same way as those that arise from radiocarbon dating. In order to illustrate the model and the inference method used, we will present results based on French, Bulgarian and Austrian datasets recently published.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Statistical Mechanics of Prion Diseases
Slepoy, A.; Singh, R. R. P.; Pazmandi, F.; Kulkarni, R. V.; Cox, D. L.
2001-07-30
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ''species barriers'' to prion infection and assess a related treatment protocol.
Statistical Mechanics of Prion Diseases
NASA Astrophysics Data System (ADS)
Slepoy, A.; Singh, R. R.; Pázmándi, F.; Kulkarni, R. V.; Cox, D. L.
2001-07-01
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ``species barriers'' to prion infection and assess a related treatment protocol.
Statistical mechanics of prion diseases.
Slepoy, A; Singh, R R; Pázmándi, F; Kulkarni, R V; Cox, D L
2001-07-30
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model "species barriers" to prion infection and assess a related treatment protocol. PMID:11497806
Vector statistics of LANDSAT imagery
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.; Underwood, D.
1977-01-01
A digitized multispectral image, such as LANDSAT data, is composed of numerous four dimensional vectors, which quantitatively describe the ground scene from which the data are acquired. The statistics of unique vectors that occur in LANDSAT imagery are studied to determine if that information can provide some guidance on reducing image processing costs. A second purpose of this report is to investigate how the vector statistics are changed by various types of image processing techniques and determine if that information can be useful in choosing one processing approach over another.
Bayesian Nonparametric Inference – Why and How
Müller, Peter; Mitra, Riten
2013-01-01
We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
Inference engine using optical array logic
NASA Astrophysics Data System (ADS)
Iwata, Masaya; Tanida, Jun; Ichioka, Yoshiki
1990-07-01
An implementation method for an inference engine using optical array logic is presented. Optical array logic is a technique for parallel neighborhood operation using spatial coding and 2-D correlation. For efficient execution of inference in artificial intelligence problems, a large number of data must be searched effectively. To achieve this demand, a template matching technique is applied to the inference operation. By introducing a new function of data conversion, the inference operation can be implemented with optical array logic, which utilizes parallelism in optical techniques.
Visualization of group inference data in functional neuroimaging.
Gläscher, Jan
2009-01-01
While thresholded statistical parametric maps can convey an accurate account for the location and spatial extent of an effect in functional neuroimaging studies, their use is somewhat limited for characterizing more complex experimental effects, such as interactions in a factorial design. The resulting necessity for plotting the underlying data has long been recognized. Statistical Parametric Mapping (SPM) is a widely used software package for analyzing functional neuroimaging data that offers a variety of options for visualizing data from first level analyses. However, nowadays, the thrust of the statistical inference lies at the second level thus allowing for population inference. Unfortunately, the options for visualizing data from second level analyses are quite sparse. rfxplot is a new toolbox designed to alleviate this problem by providing a comprehensive array of options for plotting data from within second level analyses in SPM. These include graphs of average effect sizes (across subjects), averaged fitted responses and event-related blood oxygen level-dependent (BOLD) time courses. All data are retrieved from the underlying first level analyses and voxel selection can be tailored to the maximum effect in each subject within a defined search volume. All plot configurations can be easily configured via a graphical user-interface as well as non-interactively via a script. The large variety of plot options renders rfxplot suitable both for data exploration as well as producing high-quality figures for publications. PMID:19140033
Gaussian quadrature inference for continuous-variable quantum key distribution
NASA Astrophysics Data System (ADS)
Gyongyosi, L.; Imre, S.
2016-05-01
We propose the Gaussian quadrature inference (GQI) method for multicarrier continuous-variable quantum key distribution (CVQKD). A multicarrier CVQKD protocol utilizes Gaussian subcarrier quantum continuous variables (CV) for information transmission. The GQI framework provides a minimal error estimate of the quadratures of the CV quantum states from the discrete, measured noisy subcarrier variables. GQI utilizes the fundamentals of regularization theory and statistical information processing. We characterize GQI for multicarrier CVQKD, and define a method for the statistical modeling and processing of noisy Gaussian subcarrier quadratures. We demonstrate the results through the adaptive multicarrier quadrature division (AMQD) scheme. We introduce the terms statistical secret key rate and statistical private classical information, which quantities are derived purely by the statistical functions of GQI. We prove the secret key rate formulas for a multiple access multicarrier CVQKD via the AMQD-MQA (multiuser quadrature allocation) scheme. The framework can be established in an arbitrary CVQKD protocol and measurement setting, and are implementable by standard low-complexity statistical functions, which is particularly convenient for an experimental CVQKD scenario.
Exotic statistics of leapfrogging vortex rings.
Niemi, Antti J
2005-04-01
The leapfrogging motion of vortex rings is a three-dimensional version of the motion that in two dimensions leads to exotic exchange statistics. The statistical phase factor can be computed using the hydrodynamical Euler equation, which suggests that three-dimensional exotic exchange statistics is a common property of vortex rings in a variety of quantum liquids and gases. Potential applications range from helium superfluids to Bose-Einstein condensed alkali gases, metallic hydrogen in its liquid phases, and maybe even nuclear matter in extreme conditions. PMID:15903923
Cosmetic Plastic Surgery Statistics
2014 Cosmetic Plastic Surgery Statistics Cosmetic Procedure Trends 2014 Plastic Surgery Statistics Report Please credit the AMERICAN SOCIETY OF PLASTIC SURGEONS when citing statistical data or using ...
Causal Inference and Explaining Away in a Spiking Network
Moreno-Bote, Rubén; Drugowitsch, Jan
2015-01-01
While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426
Halo detection via large-scale Bayesian inference
NASA Astrophysics Data System (ADS)
Merson, Alexander I.; Jasche, Jens; Abdalla, Filipe B.; Lahav, Ofer; Wandelt, Benjamin; Jones, D. Heath; Colless, Matthew
2016-08-01
We present a proof-of-concept of a novel and fully Bayesian methodology designed to detect haloes of different masses in cosmological observations subject to noise and systematic uncertainties. Our methodology combines the previously published Bayesian large-scale structure inference algorithm, HAmiltonian Density Estimation and Sampling algorithm (HADES), and a Bayesian chain rule (the Blackwell-Rao estimator), which we use to connect the inferred density field to the properties of dark matter haloes. To demonstrate the capability of our approach, we construct a realistic galaxy mock catalogue emulating the wide-area 6-degree Field Galaxy Survey, which has a median redshift of approximately 0.05. Application of HADES to the catalogue provides us with accurately inferred three-dimensional density fields and corresponding quantification of uncertainties inherent to any cosmological observation. We then use a cosmological simulation to relate the amplitude of the density field to the probability of detecting a halo with mass above a specified threshold. With this information, we can sum over the HADES density field realisations to construct maps of detection probabilities and demonstrate the validity of this approach within our mock scenario. We find that the probability of successful detection of haloes in the mock catalogue increases as a function of the signal to noise of the local galaxy observations. Our proposed methodology can easily be extended to account for more complex scientific questions and is a promising novel tool to analyse the cosmic large-scale structure in observations.
Room geometry inference based on spherical microphone array eigenbeam processing.
Mabande, Edwin; Kowalczyk, Konrad; Sun, Haohai; Kellermann, Walter
2013-10-01
The knowledge of parameters characterizing an acoustic environment, such as the geometric information about a room, can be used to enhance the performance of several audio applications. In this paper, a novel method for three-dimensional room geometry inference based on robust and high-resolution beamforming techniques for spherical microphone arrays is presented. Unlike other approaches that are based on the measurement and processing of multiple room impulse responses, here, microphone array signal processing techniques for uncontrolled broadband acoustic signals are applied. First, the directions of arrival (DOAs) and time differences of arrival (TDOAs) of the direct signal and room reflections are estimated using high-resolution robust broadband beamforming techniques and cross-correlation analysis. In this context, the main challenges include the low reflected-signal to background-noise power ratio, the low energy of reflected signals relative to the direct signal, and their strong correlation with the direct signal and among each other. Second, the DOA and TDOA information is combined to infer the room geometry using geometric relations. The high accuracy of the proposed room geometry inference technique is confirmed by experimental evaluations based on both simulated and measured data for moderately reverberant rooms. PMID:24116416
Protein inference: A protein quantification perspective.
He, Zengyou; Huang, Ting; Liu, Xiaoqing; Zhu, Peijun; Teng, Ben; Deng, Shengchun
2016-08-01
In mass spectrometry-based shotgun proteomics, protein quantification and protein identification are two major computational problems. To quantify the protein abundance, a list of proteins must be firstly inferred from the raw data. Then the relative or absolute protein abundance is estimated with quantification methods, such as spectral counting. Until now, most researchers have been dealing with these two processes separately. In fact, the protein inference problem can be regarded as a special protein quantification problem in the sense that truly present proteins are those proteins whose abundance values are not zero. Some recent published papers have conceptually discussed this possibility. However, there is still a lack of rigorous experimental studies to test this hypothesis. In this paper, we investigate the feasibility of using protein quantification methods to solve the protein inference problem. Protein inference methods aim to determine whether each candidate protein is present in the sample or not. Protein quantification methods estimate the abundance value of each inferred protein. Naturally, the abundance value of an absent protein should be zero. Thus, we argue that the protein inference problem can be viewed as a special protein quantification problem in which one protein is considered to be present if its abundance is not zero. Based on this idea, our paper tries to use three simple protein quantification methods to solve the protein inference problem effectively. The experimental results on six data sets show that these three methods are competitive with previous protein inference algorithms. This demonstrates that it is plausible to model the protein inference problem as a special protein quantification task, which opens the door of devising more effective protein inference algorithms from a quantification perspective. The source codes of our methods are available at: http://code.google.com/p/protein-inference/. PMID:26935399
Inferring Learners' Knowledge from Their Actions
ERIC Educational Resources Information Center
Rafferty, Anna N.; LaMar, Michelle M.; Griffiths, Thomas L.
2015-01-01
Watching another person take actions to complete a goal and making inferences about that person's knowledge is a relatively natural task for people. This ability can be especially important in educational settings, where the inferences can be used for assessment, diagnosing misconceptions, and providing informative feedback. In this paper, we…
The Impact of Disablers on Predictive Inference
ERIC Educational Resources Information Center
Cummins, Denise Dellarosa
2014-01-01
People consider alternative causes when deciding whether a cause is responsible for an effect (diagnostic inference) but appear to neglect them when deciding whether an effect will occur (predictive inference). Five experiments were conducted to test a 2-part explanation of this phenomenon: namely, (a) that people interpret standard predictive…
Causal Inferences during Text Comprehension and Production.
ERIC Educational Resources Information Center
Kemper, Susan
As comprehension failure results whenever readers are unable to infer missing causal connections, recent comprehension research has focused both on assessing the inferential complexity of texts and on investigating students' developing ability to infer causal relationships. Studies have demonstrated that texts rely on four types of causal…
Scalar Inferences in Autism Spectrum Disorders
ERIC Educational Resources Information Center
Chevallier, Coralie; Wilson, Deirdre; Happe, Francesca; Noveck, Ira
2010-01-01
On being told "John or Mary will come", one might infer that "not both" of them will come. Yet the semantics of "or" is compatible with a situation where both John and Mary come. Inferences of this type, which enrich the semantics of "or" from an "inclusive" to an "exclusive" interpretation, have been extensively studied in linguistic pragmatics.…
Genetic Network Inference Using Hierarchical Structure.
Kimura, Shuhei; Tokuhisa, Masato; Okada-Hatakeyama, Mariko
2016-01-01
Many methods for inferring genetic networks have been proposed, but the regulations they infer often include false-positives. Several researchers have attempted to reduce these erroneous regulations by proposing the use of a priori knowledge about the properties of genetic networks such as their sparseness, scale-free structure, and so on. This study focuses on another piece of a priori knowledge, namely, that biochemical networks exhibit hierarchical structures. Based on this idea, we propose an inference approach that uses the hierarchical structure in a target genetic network. To obtain a reasonable hierarchical structure, the first step of the proposed approach is to infer multiple genetic networks from the observed gene expression data. We take this step using an existing method that combines a genetic network inference method with a bootstrap method. The next step is to extract a hierarchical structure from the inferred networks that is consistent with most of the networks. Third, we use the hierarchical structure obtained to assign confidence values to all candidate regulations. Numerical experiments are also performed to demonstrate the effectiveness of using the hierarchical structure in the genetic network inference. The improvement accomplished by the use of the hierarchical structure is small. However, the hierarchical structure could be used to improve the performances of many existing inference methods. PMID:26941653
Reinforcement learning or active inference?
Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J
2009-01-01
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain. PMID:19641614
Reinforcement Learning or Active Inference?
Friston, Karl J.; Daunizeau, Jean; Kiebel, Stefan J.
2009-01-01
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain. PMID:19641614
Active inference and epistemic value.
Friston, Karl; Rigoli, Francesco; Ognibene, Dimitri; Mathys, Christoph; Fitzgerald, Thomas; Pezzulo, Giovanni
2015-01-01
We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms. PMID:25689102
OROZCO-terWENGEL, PABLO; CORANDER, JUKKA; SCHLÖTTERER, CHRISTIAN
2011-01-01
Over the past decades, the use of molecular markers has revolutionized biology and led to the foundation of a new research discipline—phylogeography. Of particular interest has been the inference of population structure and biogeography. While initial studies focused on mtDNA as a molecular marker, it has become apparent that selection and genealogical lineage sorting could lead to erroneous inferences. As it is not clear to what extent these forces affect a given marker, it has become common practice to use the combined evidence from a set of molecular markers as an attempt to recover the signals that approximate the true underlying demography. Typically, the number of markers used is determined by either budget constraints or by statistical power required to recognize significant population differentiation. Using microsatellite markers from Drosophila and humans, we show that even large numbers of loci (>50) can frequently result in statistically well-supported, but incorrect inference of population structure using the software baps. Most importantly, genomic features, such as chromosomal location, variability of the markers, or recombination rate, cannot explain this observation. Instead, it can be attributed to sampling variation among loci with different realizations of the stochastic lineage sorting. This phenomenon is particularly pronounced for low levels of population differentiation. Our results have important implications for ongoing studies of population differentiation, as we unambiguously demonstrate that statistical significance of population structure inferred from a random set of genetic markers cannot necessarily be taken as evidence for a reliable demographic inference. PMID:21244537
Inference-based constraint satisfaction supports explanation
Sqalli, M.H.; Freuder, E.C.
1996-12-31
Constraint satisfaction problems are typically solved using search, augmented by general purpose consistency inference methods. This paper proposes a paradigm shift in which inference is used as the primary problem solving method, and attention is focused on special purpose, domain specific inference methods. While we expect this approach to have computational advantages, we emphasize here the advantages of a solution method that is more congenial to human thought processes. Specifically we use inference-based constraint satisfaction to support explanations of the problem solving behavior that are considerably more meaningful than a trace of a search process would be. Logic puzzles are used as a case study. Inference-based constraint satisfaction proves surprisingly powerful and easily extensible in this domain. Problems drawn from commercial logic puzzle booklets are used for evaluation. Explanations are produced that compare well with the explanations provided by these booklets.
Inference of protein diffusion probed via fluorescence correlation spectroscopy
NASA Astrophysics Data System (ADS)
Tsekouras, Konstantinos
2015-03-01
Fluctuations are an inherent part of single molecule or few particle biophysical data sets. Traditionally, ``noise'' fluctuations have been viewed as a nuisance, to be eliminated or minimized. Here we look on how statistical inference methods - that take explicit advantage of fluctuations - have allowed us to draw an unexpected picture of single molecule diffusional dynamics. Our focus is on the diffusion of proteins probed using fluorescence correlation spectroscopy (FCS). First, we discuss how - in collaboration with the Bustamante and Marqusee labs at UC Berkeley - we determined using FCS data that individual enzymes are perturbed by self-generated catalytic heat (Riedel et al, Nature, 2014). Using the tools of inference, we found how distributions of enzyme diffusion coefficients shift in the presence of substrate revealing that enzymes performing highly exothermic reactions dissipate heat by transiently accelerating their center of mass following a catalytic reaction. Next, when molecules diffuse in the cell nucleus they often appear to diffuse anomalously. We analyze FCS data - in collaboration with Rich Day at the IU Med School - to propose a simple model for transcription factor binding-unbinding in the nucleus to show that it may give rise to apparent anomalous diffusion. Here inference methods extract entire binding affinity distributions for the diffusing transcription factors, allowing us to precisely characterize their interactions with different components of the nuclear environment. From this analysis, we draw key mechanistic insight that goes beyond what is possible by simply fitting data to ``anomalous diffusion'' models.
Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent
Wen, Dingqiao; Yu, Yun; Nakhleh, Luay
2016-01-01
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. PMID:27144273
Computational approaches to protein inference in shotgun proteomics.
Li, Yong Fuga; Radivojac, Predrag
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123
Pecevski, Dejan
2016-01-01
Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent.
Wen, Dingqiao; Yu, Yun; Nakhleh, Luay
2016-05-01
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. PMID:27144273
Inference of neuronal network spike dynamics and topology from calcium imaging data
Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof
2013-01-01
Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936
BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA1
Luo, Ruiyan; Zhao, Hongyu
2011-01-01
Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis. PMID:22162986
Inference for the physical sciences
Jones, Nick S.; Maccarone, Thomas J.
2013-01-01
There is a disconnect between developments in modern data analysis and some parts of the physical sciences in which they could find ready use. This introduction, and this issue, provides resources to help experimental researchers access modern data analysis tools and exposure for analysts to extant challenges in physical science. We include a table of resources connecting statistical and physical disciplines and point to appropriate books, journals, videos and articles. We conclude by highlighting the relevance of each of the articles in the associated issue. PMID:23277613
Inference for the physical sciences.
Jones, Nick S; Maccarone, Thomas J
2013-02-13
There is a disconnect between developments in modern data analysis and some parts of the physical sciences in which they could find ready use. This introduction, and this issue, provides resources to help experimental researchers access modern data analysis tools and exposure for analysts to extant challenges in physical science. We include a table of resources connecting statistical and physical disciplines and point to appropriate books, journals, videos and articles. We conclude by highlighting the relevance of each of the articles in the associated issue. PMID:23277613