Statistical Physics of High Dimensional Inference
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
To model modern large-scale datasets, we need efficient algorithms to infer a set of P unknown model parameters from N noisy measurements. What are fundamental limits on the accuracy of parameter inference, given limited measurements, signal-to-noise ratios, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =N/P --> ∞ . However, modern high-dimensional inference problems, in fields ranging from bio-informatics to economics, occur at finite α. We formulate and analyze high-dimensional inference analytically by applying the replica and cavity methods of statistical physics where data serves as quenched disorder and inferred parameters play the role of thermal degrees of freedom. Our analysis reveals that widely cherished Bayesian inference algorithms such as maximum likelihood and maximum a posteriori are suboptimal in the modern setting, and yields new tractable, optimal algorithms to replace them as well as novel bounds on the achievable accuracy of a large class of high-dimensional inference algorithms. Thanks to Stanford Graduate Fellowship and Mind Brain Computation IGERT grant for support.
High-dimensional statistical inference: From vector to matrix
NASA Astrophysics Data System (ADS)
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case
Challenges and approaches to statistical design and inference in high-dimensional investigations.
Gadbury, Gary L; Garrett, Karen A; Allison, David B
2009-01-01
Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other "omic" data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative.
Statistical learning and selective inference
Taylor, Jonathan; Tibshirani, Robert J.
2015-01-01
We describe the problem of “selective inference.” This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have “cherry-picked”—searched for the strongest associations—means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis. PMID:26100887
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Predict! Teaching Statistics Using Informational Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Local and Global Thinking in Statistical Inference
ERIC Educational Resources Information Center
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing. PMID:19351454
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Inference and the introductory statistics course
NASA Astrophysics Data System (ADS)
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-01
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Statistical Mechanics of Optimal Convex Inference in High Dimensions
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
2016-07-01
A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =(N /P )→∞ . However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α . We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than
Likelihood-Free Inference in High-Dimensional Models.
Kousathanas, Athanasios; Leuenberger, Christoph; Helfer, Jonas; Quinodoz, Mathieu; Foll, Matthieu; Wegmann, Daniel
2016-06-01
Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza. PMID:27052569
Likelihood-Free Inference in High-Dimensional Models.
Kousathanas, Athanasios; Leuenberger, Christoph; Helfer, Jonas; Quinodoz, Mathieu; Foll, Matthieu; Wegmann, Daniel
2016-06-01
Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza.
The renormalization group via statistical inference
NASA Astrophysics Data System (ADS)
Bény, Cédric; Osborne, Tobias J.
2015-08-01
In physics, one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of states, and argue that the renormalization group (RG) arises from the inherent ambiguities associated with the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives in a given equivalence class. This provides a unifying framework and reveals the role played by information in renormalization. We validate this idea by showing that it justifies the use of low-momenta n-point functions as statistically relevant observables around a Gaussian hypothesis. These results enable the calculation of distinguishability in quantum field theory. Our methods also provide a way to extend renormalization techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the relationships between various type of RG.
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method
Quantum statistical inference for density estimation
Silver, R.N.; Martz, H.F.; Wallstrom, T.
1993-11-01
A new penalized likelihood method for non-parametric density estimation is proposed, which is based on a mathematical analogy to quantum statistical physics. The mathematical procedure for density estimation is related to maximum entropy methods for inverse problems; the penalty function is a convex information divergence enforcing global smoothing toward default models, positivity, extensivity and normalization. The novel feature is the replacement of classical entropy by quantum entropy, so that local smoothing may be enforced by constraints on the expectation values of differential operators. Although the hyperparameters, covariance, and linear response to perturbations can be estimated by a variety of statistical methods, we develop the Bayesian interpretation. The linear response of the MAP estimate is proportional to the covariance. The hyperparameters are estimated by type-II maximum likelihood. The method is demonstrated on standard data sets.
Simultaneous Statistical Inference for Epigenetic Data
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology. PMID:25965389
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
LOWER LEVEL INFERENCE CONTROL IN STATISTICAL DATABASE SYSTEMS
Lipton, D.L.; Wong, H.K.T.
1984-02-01
An inference is the process of transforming unclassified data values into confidential data values. Most previous research in inference control has studied the use of statistical aggregates to deduce individual records. However, several other types of inference are also possible. Unknown functional dependencies may be apparent to users who have 'expert' knowledge about the characteristics of a population. Some correlations between attributes may be concluded from 'commonly-known' facts about the world. To counter these threats, security managers should use random sampling of databases of similar populations, as well as expert systems. 'Expert' users of the DATABASE SYSTEM may form inferences from the variable performance of the user interface. Users may observe on-line turn-around time, accounting statistics. the error message received, and the point at which an interactive protocol sequence fails. One may obtain information about the frequency distributions of attribute values, and the validity of data object names from this information. At the back-end of a database system, improved software engineering practices will reduce opportunities to bypass functional units of the database system. The term 'DATA OBJECT' should be expanded to incorporate these data object types which generate new classes of threats. The security of DATABASES and DATABASE SySTEMS must be recognized as separate but related problems. Thus, by increased awareness of lower level inferences, system security managers may effectively nullify the threat posed by lower level inferences.
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Statistical inference in behavior analysis: Experimental control is better
Perone, Michael
1999-01-01
Statistical inference promises automatic, objective, reliable assessments of data, independent of the skills or biases of the investigator, whereas the single-subject methods favored by behavior analysts often are said to rely too much on the investigator's subjective impressions, particularly in the visual analysis of data. In fact, conventional statistical methods are difficult to apply correctly, even by experts, and the underlying logic of null-hypothesis testing has drawn criticism since its inception. By comparison, single-subject methods foster direct, continuous interaction between investigator and subject and development of strong forms of experimental control that obviate the need for statistical inference. Treatment effects are demonstrated in experimental designs that incorporate replication within and between subjects, and the visual analysis of data is adequate when integrated into such designs. Thus, single-subject methods are ideal for shaping—and maintaining—the kind of experimental practices that will ensure the continued success of behavior analysis. PMID:22478328
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. PMID:25822617
Inference in high-dimensional parameter space.
O'Hare, Anthony
2015-11-01
Model parameter inference has become increasingly popular in recent years in the field of computational epidemiology, especially for models with a large number of parameters. Techniques such as Approximate Bayesian Computation (ABC) or maximum/partial likelihoods are commonly used to infer parameters in phenomenological models that best describe some set of data. These techniques rely on efficient exploration of the underlying parameter space, which is difficult in high dimensions, especially if there are correlations between the parameters in the model that may not be known a priori. The aim of this article is to demonstrate the use of the recently invented Adaptive Metropolis algorithm for exploring parameter space in a practical way through the use of a simple epidemiological model. PMID:26176624
Statistical inference for extinction rates based on last sightings.
Nakamura, Miguel; Del Monte-Luna, Pablo; Lluch-Belda, Daniel; Lluch-Cota, Salvador E
2013-09-21
Rates of extinction can be estimated from sighting records and are assumed to be implicitly constant by many data analysis methods. However, historical sightings are scarce. Frequently, the only information available for inferring extinction is the date of the last sighting. In this study, we developed a probabilistic model and a corresponding statistical inference procedure based on last sightings. We applied this procedure to data on recent marine extirpations and extinctions, seeking to test the null hypothesis of a constant extinction rate. We found that over the past 500 years extirpations in the ocean have been increasing but at an uncertain rate, whereas a constant rate of global marine extinctions is statistically plausible. The small sample sizes of marine extinction records generate such high uncertainty that different combinations of model inputs can yield different outputs that fit the observed data equally well. Thus, current marine extinction trends may be idiosyncratic.
Two dimensional unstable scar statistics.
Warne, Larry Kevin; Jorgenson, Roy Eberhardt; Kotulski, Joseph Daniel; Lee, Kelvin S. H. (ITT Industries/AES Los Angeles, CA)
2006-12-01
This report examines the localization of time harmonic high frequency modal fields in two dimensional cavities along periodic paths between opposing sides of the cavity. The cases where these orbits lead to unstable localized modes are known as scars. This paper examines the enhancements for these unstable orbits when the opposing mirrors are both convex and concave. In the latter case the construction includes the treatment of interior foci.
Indirect Fourier transform in the context of statistical inference.
Muthig, Michael; Prévost, Sylvain; Orglmeister, Reinhold; Gradzielski, Michael
2016-09-01
Inferring structural information from the intensity of a small-angle scattering (SAS) experiment is an ill-posed inverse problem. Thus, the determination of a solution is in general non-trivial. In this work, the indirect Fourier transform (IFT), which determines the pair distance distribution function from the intensity and hence yields structural information, is discussed within two different statistical inference approaches, namely a frequentist one and a Bayesian one, in order to determine a solution objectively From the frequentist approach the cross-validation method is obtained as a good practical objective function for selecting an IFT solution. Moreover, modern machine learning methods are employed to suppress oscillatory behaviour of the solution, hence extracting only meaningful features of the solution. By comparing the results yielded by the different methods presented here, the reliability of the outcome can be improved and thus the approach should enable more reliable information to be deduced from SAS experiments. PMID:27580204
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Statistical limitations in functional neuroimaging. II. Signal detection and statistical inference.
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
The field of functional neuroimaging (FNI) methodology has developed into a mature but evolving area of knowledge and its applications have been extensive. A general problem in the analysis of FNI data is finding a signal embedded in noise. This is sometimes called signal detection. Signal detection theory focuses in general on issues relating to the optimization of conditions for separating the signal from noise. When methods from probability theory and mathematical statistics are directly applied in this procedure it is also called statistical inference. In this paper we briefly discuss some aspects of signal detection theory relevant to FNI and, in addition, some common approaches to statistical inference used in FNI. Low-pass filtering in relation to functional-anatomical variability and some effects of filtering on signal detection of interest to FNI are discussed. Also, some general aspects of hypothesis testing and statistical inference are discussed. This includes the need for characterizing the signal in data when the null hypothesis is rejected, the problem of multiple comparisons that is central to FNI data analysis, omnibus tests and some issues related to statistical power in the context of FNI. In turn, random field, scale space, non-parametric and Monte Carlo approaches are reviewed, representing the most common approaches to statistical inference used in FNI. Complementary to these issues an overview and discussion of non-inferential descriptive methods, common statistical models and the problem of model selection is given in a companion paper. In general, model selection is an important prelude to subsequent statistical inference. The emphasis in both papers is on the assumptions and inherent limitations of the methods presented. Most of the methods described here generally serve their purposes well when the inherent assumptions and limitations are taken into account. Significant differences in results between different methods are most apparent in
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods
NASA Astrophysics Data System (ADS)
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Statistical inference of regulatory networks for circadian regulation.
Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco
2014-06-01
We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana. PMID:24864301
Statistical inference of regulatory networks for circadian regulation.
Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco
2014-06-01
We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.
Physics of epigenetic landscapes and statistical inference by cells
NASA Astrophysics Data System (ADS)
Lang, Alex H.
Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape'' where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing'' between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an ``optimal path'' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to
Statistical challenges of high-dimensional data
Johnstone, Iain M.; Titterington, D. Michael
2009-01-01
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue. PMID:19805443
Mechanical Stress Inference for Two Dimensional Cell Arrays
Chiou, Kevin K.; Hufnagel, Lars; Shraiman, Boris I.
2012-01-01
Many morphogenetic processes involve mechanical rearrangements of epithelial tissues that are driven by precisely regulated cytoskeletal forces and cell adhesion. The mechanical state of the cell and intercellular adhesion are not only the targets of regulation, but are themselves the likely signals that coordinate developmental process. Yet, because it is difficult to directly measure mechanical stress in vivo on sub-cellular scale, little is understood about the role of mechanics in development. Here we present an alternative approach which takes advantage of the recent progress in live imaging of morphogenetic processes and uses computational analysis of high resolution images of epithelial tissues to infer relative magnitude of forces acting within and between cells. We model intracellular stress in terms of bulk pressure and interfacial tension, allowing these parameters to vary from cell to cell and from interface to interface. Assuming that epithelial cell layers are close to mechanical equilibrium, we use the observed geometry of the two dimensional cell array to infer interfacial tensions and intracellular pressures. Here we present the mathematical formulation of the proposed Mechanical Inverse method and apply it to the analysis of epithelial cell layers observed at the onset of ventral furrow formation in the Drosophila embryo and in the process of hair-cell determination in the avian cochlea. The analysis reveals mechanical anisotropy in the former process and mechanical heterogeneity, correlated with cell differentiation, in the latter process. The proposed method opens a way for quantitative and detailed experimental tests of models of cell and tissue mechanics. PMID:22615550
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on for…
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistical Inference Models for Image Datasets with Systematic Variations
Kim, Won Hwa; Bendlin, Barbara B.; Chung, Moo K.; Johnson, Sterling C.; Singh, Vikas
2016-01-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer’s disease (AD). PMID:26989336
Contrasting diversity values: statistical inferences based on overlapping confidence intervals.
MacGregor-Fors, Ian; Payton, Mark E
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance).
Contrasting Diversity Values: Statistical Inferences Based on Overlapping Confidence Intervals
MacGregor-Fors, Ian; Payton, Mark E.
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance). PMID:23437239
Statistical inference in behavior analysis: Friend or foe?
Baron, Alan
1999-01-01
Behavior analysts are undecided about the proper role to be played by inferential statistics in behavioral research. The traditional view, as expressed in Sidman's Tactics of Scientific Research (1960), was that inferential statistics has no place within a science that focuses on the steady-state behavior of individual organisms. Despite this admonition, there have been steady inroads of statistical techniques into behavior analysis since then, as evidenced by publications in the Journal of the Experimental Analysis of Behavior. The issues raised by these developments were considered at a panel held at the 24th annual convention of the Association for Behavior Analysis, Orlando, Florida (May, 1998). The proceedings are reported in this and the following articles. PMID:22478323
Statistical Inference and Simulation with StatKey
ERIC Educational Resources Information Center
Quinn, Anne
2016-01-01
While looking for an inexpensive technology package to help students in statistics classes, the author found StatKey, a free Web-based app. Not only is StatKey useful for students' year-end projects, but it is also valuable for helping students learn fundamental content such as the central limit theorem. Using StatKey, students can engage in…
Statistical inference and forensic evidence: evaluating a bullet lead match.
Kaasa, Suzanne O; Peterson, Tiamoyo; Morris, Erin K; Thompson, William C
2007-10-01
This experiment tested the ability of undergraduate mock jurors (N=295) to draw appropriate conclusions from statistical data on the diagnostic value of forensic evidence. Jurors read a summary of a homicide trial in which the key evidence was a bullet lead "match" that was either highly diagnostic, non-diagnostic, or of unknown diagnostic value. There was also a control condition in which the forensic "match" was not presented. The results indicate that jurors as a group used the statistics appropriately to distinguish diagnostic from non-diagnostic forensic evidence, giving considerable weight to the former and little or no weight to the latter. However, this effect was attributable to responses of a subset of jurors who expressed confidence in their ability to use statistical data. Jurors who lacked confidence in their statistical ability failed to distinguish highly diagnostic from non-diagnostic forensic evidence; they gave no weight to the forensic evidence regardless of its diagnostic value. Confident jurors also gave more weight to evidence of unknown diagnostic value. Theoretical and legal implications are discussed.
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
Statistical, connectionist, and fuzzy inference techniques for image classification
NASA Astrophysics Data System (ADS)
Israel, Steven A.; Kasabov, Nikola K.
1997-07-01
A spectral classification comparison was performed using four different classifiers, the parametric maximum likelihood classifier and three nonparametric classifiers: neural networks, fuzzy rules, and fuzzy neural networks. The input image data is a System Pour l'Observation de la Terre (SPOT) satellite image of Otago Harbour near Dunedin, New Zealand. The SPOT image data contains three spectral bands in the green, red, and visible infrared portions of the electromagnetic spectrum. The specific area contains intertidal vegetation species above and below the waterline. Of specific interest is eelgrass (Zostera novazelandica), which is a biotic indicator of environmental health. The mixed covertypes observed in an in situ field survey are difficult to classify because of subjectivity and water's preferential absorption of the visible infrared spectrum. In this analysis, each of the classifiers were applied to the data in two different testing procedures. In the first test procedure, the reference data was divided into training and test by area. Although this is an efficient data handling technique, the classifier is not presented with all of the subtle microclimate variations. In the second test procedure, the same reference areas were amalgamated and randomly sorted into training and test data. The amalgamation and sorting were performed external to the analysis software. For the first testing procedure, the highest testing accuracy was obtained through the use of fuzzy inferences at 89%. In the second testing procedure, the maximum likelihood classifier and the fuzzy neural networks provided the best results. Although the testing accuracy for the maximum likelihood classifier and the fuzzy neural networks provided the best results. Although the testing accuracy for the maximum likelihood classifier and the fuzzy neural networks were simulator, the latter algorithm has additional features, such as rules extraction, explanation, and fine tuning of individual classes.
Statistical Inference of Biometrical Genetic Model With Cultural Transmission.
Guo, Xiaobo; Ji, Tian; Wang, Xueqin; Zhang, Heping; Zhong, Shouqiang
2013-01-01
Twin and family studies establish the foundation for studying the genetic, environmental and cultural transmission effects for phenotypes. In this work, we make use of the well established statistical methods and theory for mixed models to assess cultural transmission in twin and family studies. Specifically, we address two critical yet poorly understood issues: the model identifiability in assessing cultural transmission for twin and family data and the biases in the estimates when sub-models are used. We apply our models and theory to two real data sets. A simulation is conducted to verify the bias in the estimates of genetic effects when the working model is a sub-model. PMID:24660046
Trans-dimensional Bayesian inference for large sequential data sets
NASA Astrophysics Data System (ADS)
Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.
2015-12-01
This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
Circumpulsar Asteroids: Inferences from Nulling Statistics and High Energy Correlations
NASA Astrophysics Data System (ADS)
Shannon, Ryan; Cordes, J. M.
2006-12-01
We have proposed that some classes of radio pulsar variability are associated with the entry of neutral asteroidal material into the pulsar magnetosphere. The region surrounding neutron stars is polluted with supernova fall-back material, which collapses and condenses into an asteroid-bearing disk that is stable for millions of years. Over time, collisional and radiative processes cause the asteroids to migrate inward until they are heated to the point of ionization. For older and cooler pulsars, asteroids ionize within the large magnetospheres and inject a sufficient amount of charged particles to alter the electrodynamics of the gap regions and modulate emission processes. This extrinsic model unifies many observed phenomena of variability that occur on time scales that are disparate with the much shorter time scales associated with pulsars and their magnetospheres. One such type of variability is nulling, in which certain pulsars exhibit episodes of quiescence that for some objects may be as short as a few pulse periods, but, for others, is longer than days. Here, in the context of this model, we examine the nulling phenomenon. We analyze the relationship between in-falling material and the statistics of nulling. In addition, as motivation for further high energy observations, we consider the relationship between the nulling and other magnetospheric processes.
Statistical inference from capture data on closed animal populations
Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.
1978-01-01
The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
NASA Astrophysics Data System (ADS)
Lawrence, C.; Lin, L.; Lisiecki, L. E.; Khider, D.
2014-12-01
The broad goal of this presentation is to demonstrate the utility of probabilistic generative models to capture investigators' knowledge of geological processes and proxy data to draw statistical inferences about unobserved paleoclimatological events. We illustrate how this approach forces investigators to be explicit about their assumptions, and about how probability theory yields results that are a mathematical consequence of these assumptions and the data. We illustrate these ideas with the HMM-Match model that infers common times of sediment deposition in two records and the uncertainty in these inferences in the form of confidence bands. HMM-Match models the sedimentation processes that led to proxy data measured in marine sediment cores. This Bayesian model has three components: 1) a generative probabilistic model that proceeds from the underlying geophysical and geochemical events, specifically the sedimentation events to the generation the proxy data Sedimentation ---> Proxy Data ; 2) a recursive algorithm that reverses the logic of the model to yield inference about the unobserved sedimentation events and the associated alignment of the records based on proxy data Proxy Data ---> Sedimentation (Alignment) ; 3) an expectation maximization algorithm for estimating two unknown parameters. We applied HMM-Match to align 35 Late Pleistocene records to a global benthic d18Ostack and found that the mean width of 95% confidence intervals varies between 3-23 kyr depending on the resolution and noisiness of the core's d18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. Results from this algorithm will allow researchers to examine the robustness of their conclusions with respect to alignment uncertainty. Figure 1 shows the confidence bands for one low resolution record.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Young Children's Use of Statistical Sampling Evidence to Infer the Subjectivity of Preferences
ERIC Educational Resources Information Center
Ma, Lili; Xu, Fei
2011-01-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a…
Elucidating the Foundations of Statistical Inference with 2 x 2 Tables
Choi, Leena; Blume, Jeffrey D.; Dupont, William D.
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences. PMID:25849515
Inferring the connectivity of coupled oscillators from time-series statistical similarity analysis.
Tirabassi, Giulio; Sevilla-Escoboza, Ricardo; Buldú, Javier M; Masoller, Cristina
2015-06-04
A system composed by interacting dynamical elements can be represented by a network, where the nodes represent the elements that constitute the system, and the links account for their interactions, which arise due to a variety of mechanisms, and which are often unknown. A popular method for inferring the system connectivity (i.e., the set of links among pairs of nodes) is by performing a statistical similarity analysis of the time-series collected from the dynamics of the nodes. Here, by considering two systems of coupled oscillators (Kuramoto phase oscillators and Rössler chaotic electronic oscillators) with known and controllable coupling conditions, we aim at testing the performance of this inference method, by using linear and non linear statistical similarity measures. We find that, under adequate conditions, the network links can be perfectly inferred, i.e., no mistakes are made regarding the presence or absence of links. These conditions for perfect inference require: i) an appropriated choice of the observed variable to be analysed, ii) an appropriated interaction strength, and iii) an adequate thresholding of the similarity matrix. For the dynamical units considered here we find that the linear statistical similarity measure performs, in general, better than the non-linear ones.
Inferring the statistical interpretation of quantum mechanics from the classical limit
Gottfried
2000-06-01
It is widely believed that the statistical interpretation of quantum mechanics cannot be inferred from the Schrodinger equation itself, and must be stated as an additional independent axiom. Here I propose that the situation is not so stark. For systems that have both continuous and discrete degrees of freedom (such as coordinates and spin respectively), the statistical interpretation for the discrete variables is implied by requiring that the system's gross motion can be classically described under circumstances specified by the Schrodinger equation. However, this is not a full-fledged derivation of the statistical interpretation because it does not apply to the continuous variables of classical mechanics.
ERIC Educational Resources Information Center
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Van den Noortgate, Wim; Onghena, Patrick
2007-01-01
A solid understanding of "inferential statistics" is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications…
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject. PMID:27081307
Advances and challenges in the attribution of climate impacts using statistical inference
NASA Astrophysics Data System (ADS)
Hsiang, S. M.
2015-12-01
We discuss recent advances, challenges, and debates in the use of statistical models to infer and attribute climate impacts, such as distinguishing effects of "climate" vs. "weather," accounting for simultaneous environmental changes along multiple dimensions, evaluating multiple sources of uncertainty, accounting for adaptation, and simulating counterfactual economic or social trajectories. We relate these ideas to recent findings linking temperature to economic productivity/violence and tropical cyclones to economic growth.
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on "significance" tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the "new statistics" (confidence intervals) logic. PMID:25745383
Inference in infinite-dimensional inverse problems - Discretization and duality
NASA Technical Reports Server (NTRS)
Stark, Philip B.
1992-01-01
Many techniques for solving inverse problems involve approximating the unknown model, a function, by a finite-dimensional 'discretization' or parametric representation. The uncertainty in the computed solution is sometimes taken to be the uncertainty within the parametrization; this can result in unwarranted confidence. The theory of conjugate duality can overcome the limitations of discretization within the 'strict bounds' formalism, a technique for constructing confidence intervals for functionals of the unknown model incorporating certain types of prior information. The usual computational approach to strict bounds approximates the 'primal' problem in a way that the resulting confidence intervals are at most long enough to have the nominal coverage probability. There is another approach based on 'dual' optimization problems that gives confidence intervals with at least the nominal coverage probability. The pair of intervals derived by the two approaches bracket a correct confidence interval. The theory is illustrated with gravimetric, seismic, geomagnetic, and helioseismic problems and a numerical example in seismology.
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Shujie, MA; Carroll, Raymond J.; Liang, Hua; Xu, Shizhong
2015-01-01
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423–1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the “large p small n” setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration. PMID:26412908
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on “significance” tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the “new statistics” (confidence intervals) logic. PMID:25745383
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
Statistical entropy of charged two-dimensional black holes
NASA Astrophysics Data System (ADS)
Teo, Edward
1998-06-01
The statistical entropy of a five-dimensional black hole in Type II string theory was recently derived by showing that it is U-dual to the three-dimensional Bañados-Teitelboim-Zanelli black hole, and using Carlip's method to count the microstates of the latter. This is valid even for the non-extremal case, unlike the derivation which relies on D-brane techniques. In this letter, I shall exploit the U-duality that exists between the five-dimensional black hole and the two-dimensional charged black hole of McGuigan, Nappi and Yost, to microscopically compute the entropy of the latter. It is shown that this result agrees with previous calculations using thermodynamic arguments.
A Comprehensive Statistical Model for Cell Signaling and Protein Activity Inference
Yörük, Erdem; Ochs, Michael F.; Geman, Donald; Younes, Laurent
2010-01-01
Protein signaling networks play a central role in transcriptional regulation and the etiology of many diseases. Statistical methods, particularly Bayesian networks, have been widely used to model cell signaling, mostly for model organisms and with focus on uncovering connectivity rather than inferring aberrations. Extensions to mammalian systems have not yielded compelling results, due likely to greatly increased complexity and limited proteomic measurements in vivo. In this study, we propose a comprehensive statistical model that is anchored to a predefined core topology, has a limited complexity due to parameter sharing and uses micorarray data of mRNA transcripts as the only observable components of signaling. Specifically, we account for cell heterogeneity and a multi-level process, representing signaling as a Bayesian network at the cell level, modeling measurements as ensemble averages at the tissue level and incorporating patient-to-patient differences at the population level. Motivated by the goal of identifying individual protein abnormalities as potential therapeutical targets, we applied our method to the RAS-RAF network using a breast cancer study with 118 patients. We demonstrated rigorous statistical inference, established reproducibility through simulations and the ability to recover receptor status from available microarray data. PMID:20855924
Xue, Zhong; Shen, Dinggang; Davatzikos, Christos
2006-10-01
This paper proposes a 3D statistical model aiming at effectively capturing statistics of high-dimensional deformation fields and then uses this prior knowledge to constrain 3D image warping. The conventional statistical shape model methods, such as the active shape model (ASM), have been very successful in modeling shape variability. However, their accuracy and effectiveness typically drop dramatically in high-dimensionality problems involving relatively small training datasets, which is customary in 3D and 4D medical imaging applications. The proposed statistical model of deformation (SMD) uses wavelet-based decompositions coupled with PCA in each wavelet band, in order to more accurately estimate the pdf of high-dimensional deformation fields, when a relatively small number of training samples are available. SMD is further used as statistical prior to regularize the deformation field in an SMD-constrained deformable registration framework. As a result, more robust registration results are obtained relative to using generic smoothness constraints on deformation fields, such as Laplacian-based regularization. In experiments, we first illustrate the performance of SMD in representing the variability of deformation fields and then evaluate the performance of the SMD-constrained registration, via comparing a hierarchical volumetric image registration algorithm, HAMMER, with its SMD-constrained version, referred to as SMD+HAMMER. This SMD-constrained deformable registration framework can potentially incorporate various registration algorithms to improve robustness and stability via statistical shape constraints.
Statistically Inferring Protein-Protein Associations with Affinity Isolation LC-MS/MS Assays
Sharp, Julia L.; Anderson, Kevin K.; Daly, Don S.; Pelletier, Dale A; Hurst, Gregory {Greg} B; Cannon, Bill; Auberry, Deanna L; Schmoyer, Denise D; McDonald, W Hayes; White, Amanda M.; Hooker, Brian; Victry, Kristin D; Buchanan, Michelle V; Kerry, Vladimir; Wiley, Steven
2007-01-01
Affinity isolation of protein complexes followed by protein identification by LC-MS/MS is an increasingly popular approach for mapping protein interactions. However, systematic and random assay errors from multiple sources must be considered to confidently infer authentic protein-protein interactions. To address this issue, we developed a general, robust statistical method for inferring authentic interactions from protein prey-by-bait frequency tables using a binomial-based likelihood ratio test (LRT) coupled with Bayes Odds estimation. We then applied our LRT-Bayes algorithm experimentally using data from protein complexes isolated from Rhodopseudomonas palustris. Our algorithm, in conjunction with the experimental protocol, inferred with high confidence authentic interacting proteins from abundant, stable complexes, but few or no authentic interactions for lower-abundance complexes. We conclude that the experimental protocol including the LRT-Bayes algorithm produces results with high confidence but moderate sensitivity. We also found that Monte Carlo simulation is a feasible tool for checking modeling assumptions, estimating parameters, and evaluating the significance of results in protein association studies.
Statistically Inferring Protein-Protein Assocations with Affinity isolation LC-MS/MS assays
Sharp, Julia L.; Anderson, Kevin K.; Hurst, Gregory {Greg} B; Daly, Don S.; Pelletier, Dale A; Cannon, Bill; Auberry, Deanna L; Schmoyer, Denise D; McDonald, W Hayes; White, Amanda M.; Hooker, Brian; Victry, Kristin D; Buchanan, Michelle V; Kerry, Vladimir; Wiley, Steven; Doktycz, Mitchel John
2007-01-01
Affinity isolation of protein complexes followed by protein identification by LC-MS/MS is an increasingly popular approach for mapping protein interactions. However, systematic and random assay errors from multiple sources must be considered to confidently infer authentic protein-protein interactions. To address this issue, we developed a general, robust statistical method for inferring authentic interactions from protein prey-by-bait frequency tables using a binomial-based likelihood ratio test (LRT) coupled with Bayes' Odds estimation. We then applied our LRT-Bayes' algorithm experimentally using data from protein complexes isolated from Rhodopseudomonas palustris. Our algorithm, in conjunction with the experimental protocol, inferred with high confidence authentic interacting proteins from abundant, stable complexes, but few or no authentic interactions for lower-abundance complexes. We conclude that the experimental protocol including the LRT-Bayes' algorithm produces results with high confidence but moderate sensitivity. We also found that Monte Carlo simulation is a feasible tool for checking modeling assumptions, estimating parameters, and evaluating the significance of results in protein association studies.
Statistically Inferring Protein-Protein Asociations with Affinity Isolation LC-MS/MS Assays
Sharp, Julia L.; Anderson, Kevin K.; Hurst, G. B.; Daly, Don S.; Pelletier, Dale A.; Cannon, William R.; Auberry, Deanna L.; Schmoyer, Denise D.; McDonald, W. Hayes; White, Amanda M.; Hooker, Brian S.; Victry, Kristin D.; Buchanan, M. V.; Kery, Vladimir; Wiley, H. S.
2007-09-30
Affinity isolation of protein complexes followed by protein identification by LC-MS/MS is an increasingly popular approach for mapping protein interactions. However, systematic and random assay errors from multiple sources must be considered to confidently infer authentic protein-protein interactions. To address this issue, we developed a general, robust statistical method for inferring authentic interactions from protein prey-by-bait frequency tables using a binomial-based likelihood ratio test (LRT) coupled with Bayes’ Odds estimation. We then applied our LRT-Bayes’ algorithm experimentally using data from protein complexes isolated from Rhodopseudomonas palustris. Our algorithm, in conjunction with the experimental protocol, inferred with high confidence authentic interacting proteins from abundant, stable complexes, but few or no authentic interactions for lower-abundance complexes. The algorithm can discriminate against a background of prey proteins that are detected in association with a large number of baits as an artifact of the measurement. We conclude that the experimental protocol including the LRT-Bayes’ algorithm produces results with high confidence but moderate sensitivity. We also found that Monte Carlo simulation is a feasible tool for checking modeling assumptions, estimating parameters, and evaluating the significance of results in protein association studies.
Inferring biological tasks using Pareto analysis of high-dimensional data.
Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri
2015-03-01
We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.
Inferring biological tasks using Pareto analysis of high-dimensional data.
Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri
2015-03-01
We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks. PMID:25622107
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies. PMID:26736741
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach.more » The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.« less
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
NASA Technical Reports Server (NTRS)
Lerner, Jeffrey A.; Jedlovec, Gary J.; Atkinson, Robert J.
1998-01-01
Ever since the first satellite image loops from the 6.3 micron water vapor channel on the METEOSAT-1 in 1978, there have been numerous efforts (many to a great degree of success) to relate the water vapor radiance patterns to familiar atmospheric dynamic quantities. The realization of these efforts is becoming evident with the merging of satellite derived winds into predictive models (Velden et al., 1997; Swadley and Goerss, 1989). Another parameter that has been quantified from satellite water vapor channel measurements is upper tropospheric relative humidity (UTH) (e.g., Soden and Bretherton, 1996; Schmetz and Turpeinen, 1988). These humidity measurements, in turn, can be used to quantify upper tropospheric water vapor and its transport to more accurately diagnose climate changes (Lerner et al., 1998; Schmetz et al. 1995a) and quantify radiative processes in the upper troposphere. Also apparent in water vapor imagery animations are regions of subsiding and ascending air flow. Indeed, a component of the translated motions we observe are due to vertical velocities. The few attempts at exploiting this information have been met with a fair degree of success. Picon and Desbois (1990) statistically related Meteosat monthly mean water vapor radiances to six standard pressure levels of the European Centre for Medium Range Weather Forecast (ECMWF) model vertical velocities and found correlation coefficients of about 0.50 or less. This paper presents some preliminary results of viewing climatological satellite water vapor data in a different fashion. Specifically, we attempt to infer the three dimensional flow characteristics of the mid- to upper troposphere as portrayed by GOES VAS during the warm ENSO event (1987) and a subsequent cold period in 1998.
Inferences on weather extremes and weather-related disasters: a review of statistical methods
NASA Astrophysics Data System (ADS)
Visser, H.; Petersen, A. C.
2011-09-01
The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in climate-change research. Due to the great societal consequences of extremes - historically, now and in the future - the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to short databases with disaster statistics and human impacts (30 to 60 yr). In scanning the peer-reviewed literature on weather extremes and impacts thereof we noticed that many different methods are used to make inferences. However, discussions on methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well. In this article we give a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs) and the availability of uncertainty information. As for stationarity we found that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices we found that often more than one PDF shape fits to the same data. From a simulation study we conclude that both the generalized extreme value (GEV) distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty it is advised to test conclusions on extremes for assumptions underlying the modeling approach. Finally, we
Inferences on weather extremes and weather-related disasters: a review of statistical methods
NASA Astrophysics Data System (ADS)
Visser, H.; Petersen, A. C.
2012-02-01
The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in research of climate change. Due to the great societal consequences of extremes - historically, now and in the future - the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to relatively short (30 to 60 yr) databases with disaster statistics and human impacts. When scanning peer-reviewed literature on weather extremes and its impacts, it is noticeable that many different methods are used to make inferences. However, discussions on these methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well. In this article, a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters is given. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs) and the availability of uncertainty information. As for stationarity assumptions, the outcome was that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices it was found that often more than one PDF shape fits to the same data. From a simulation study the conclusion can be drawn that both the generalized extreme value (GEV) distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty, it is advisable to test conclusions on extremes
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research.
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; Sargsyan, Khachik; Najm, Habib N.
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H2/O2-mechanism chain branching reaction H + O2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the given summary statistics, usingmore » Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. PMID:23260716
Three-dimensional statistical model for gingival contour reconstruction.
Wu, Ting; Liao, Wenhe; Dai, Ning
2012-04-01
Optimal gingival contours around restored teeth and implants are of critical importance for restorative success and esthetics. This paper describes a novel computer-aided methodology for building a 3-D statistical model of gingival contours from a 3-D scan dental dataset and reconstructing missing gingival contours in partially edentulous patients. The gingival boundaries were first obtained from the 3-D dental model through a discrete curvature analysis and shortest path searching algorithm. Based on the gingival shape differential characteristics, the boundaries were demarcated to construct the gingival contour of each individual tooth. Through B-spline curve approximation to each gingival contour, the control points of the B-spline curves are used as the shape vector for training the model. Statistical analysis results demonstrate that the method can give a simple but compact model that effectively capture the most important variations in arch width and shape as well as gingival morphology and position. Within this statistical model, the morphologically plausible missing contours can be inferred based on a nonlinear optimization fitting from the global similarity transformation, the model shape deformation and a Mahalanobis prior. The reconstruction performance is evaluated through large simulated experimental data and a real patient case, which demonstrates the effectiveness of this approach.
Statistical mechanics of shell models for two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Aurell, E.; Boffetta, G.; Crisanti, A.; Frick, P.; Paladin, G.; Vulpiani, A.
1994-12-01
We study shell models that conserve the analogs of energy and enstrophy and hence are designed to mimic fluid turbulence in two-dimensions (2D). The main result is that the observed state is well described as a formal statistical equilibrium, closely analogous to the approach to two-dimensional ideal hydrodynamics of Onsager [Nuovo Cimento Suppl. 6, 279 (1949)], Hopf [J. Rat. Mech. Anal. 1, 87 (1952)], and Lee [Q. Appl. Math. 10, 69 (1952)]. In the presence of forcing and dissipation we observe a forward flux of enstrophy and a backward flux of energy. These fluxes can be understood as mean diffusive drifts from a source to two sinks in a system which is close to local equilibrium with Lagrange multipliers (``shell temperatures'') changing slowly with scale. This is clear evidence that the simplest shell models are not adequate to reproduce the main features of two-dimensional turbulence. The dimensional predictions on the power spectra from a supposed forward cascade of enstrophy and from one branch of the formal statistical equilibrium coincide in these shell models in contrast to the corresponding predictions for the Navier-Stokes and Euler equations in 2D. This coincidence has previously led to the mistaken conclusion that shell models exhibit a forward cascade of enstrophy. We also study the dynamical properties of the models and the growth of perturbations.
Specificity and timescales of cortical adaptation as inferences about natural movie statistics
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-01-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation. PMID:27699416
McDonald, L.L.; Erickson, W.P.; Strickland, M.D.
1995-12-31
The objective of the Coastal Habitat Injury Assessment study was to document and quantify injury to biota of the shallow subtidal, intertidal, and supratidal zones throughout the shoreline affected by oil or cleanup activity associated with the Exxon Valdez oil spill. The results of these studies were to be used to support the Trustee`s Type B Natural Resource Damage Assessment under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA). A probability based stratified random sample of shoreline segments was selected with probability proportional to size from each of 15 strata (5 habitat types crossed with 3 levels of potential oil impact) based on those data available in July, 1989. Three study regions were used: Prince William Sound, Cook Inlet/Kenai Peninsula, and Kodiak/Alaska Peninsula. A Geographic Information System was utilized to combine oiling and habitat data and to select the probability sample of study sites. Quasi-experiments were conducted where randomly selected oiled sites were compared to matched reference sites. Two levels of statistical inferences, philosophical bases, and limitations are discussed and illustrated with example data from the resulting studies. 25 refs., 4 figs., 1 tab.
Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric
2015-06-01
The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.
Statistical Downscaling in Multi-dimensional Wave Climate Forecast
NASA Astrophysics Data System (ADS)
Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.
2009-04-01
Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the
One-dimensional statistical parametric mapping in Python.
Pataky, Todd C
2012-01-01
Statistical parametric mapping (SPM) is a topological methodology for detecting field changes in smooth n-dimensional continua. Many classes of biomechanical data are smooth and contained within discrete bounds and as such are well suited to SPM analyses. The current paper accompanies release of 'SPM1D', a free and open-source Python package for conducting SPM analyses on a set of registered 1D curves. Three example applications are presented: (i) kinematics, (ii) ground reaction forces and (iii) contact pressure distribution in probabilistic finite element modelling. In addition to offering a high-level interface to a variety of common statistical tests like t tests, regression and ANOVA, SPM1D also emphasises fundamental concepts of SPM theory through stand-alone example scripts. Source code and documentation are available at: www.tpataky.net/spm1d/.
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt. PMID:24051595
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt.
Statistical thermodynamics of a two-dimensional relativistic gas.
Montakhab, Afshin; Ghodrat, Malihe; Barati, Mahmood
2009-03-01
In this paper we study a fully relativistic model of a two-dimensional hard-disk gas. This model avoids the general problems associated with relativistic particle collisions and is therefore an ideal system to study relativistic effects in statistical thermodynamics. We study this model using molecular-dynamics simulation, concentrating on the velocity distribution functions. We obtain results for x and y components of velocity in the rest frame (Gamma) as well as the moving frame (Gamma;{'}) . Our results confirm that Jüttner distribution is the correct generalization of Maxwell-Boltzmann distribution. We obtain the same "temperature" parameter beta for both frames consistent with a recent study of a limited one-dimensional model. We also address the controversial topic of temperature transformation. We show that while local thermal equilibrium holds in the moving frame, relying on statistical methods such as distribution functions or equipartition theorem are ultimately inconclusive in deciding on a correct temperature transformation law (if any). PMID:19391919
Brannigan, V M; Bier, V M; Berg, C
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily than other product liability cases on indirect or statistical proof of injury. There have been numerous theoretical analyses of statistical proof of injury in toxic tort cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis. PMID:25898019
Lagrangian statistics in weakly forced two-dimensional turbulence.
Rivera, Michael K; Ecke, Robert E
2016-01-01
Measurements of Lagrangian single-point and multiple-point statistics in a quasi-two-dimensional stratified layer system are reported. The system consists of a layer of salt water over an immiscible layer of Fluorinert and is forced electromagnetically so that mean-squared vorticity is injected at a well-defined spatial scale ri. Simultaneous cascades develop in which enstrophy flows predominately to small scales whereas energy cascades, on average, to larger scales. Lagrangian correlations and one- and two-point displacements are measured for random initial conditions and for initial positions within topological centers and saddles. Some of the behavior of these quantities can be understood in terms of the trapping characteristics of long-lived centers, the slow motion near strong saddles, and the rapid fluctuations outside of either centers or saddles. We also present statistics of Lagrangian velocity fluctuations using energy spectra in frequency space and structure functions in real space. We compare with complementary Eulerian velocity statistics. We find that simultaneous inverse energy and enstrophy ranges present in spectra are not directly echoed in real-space moments of velocity difference. Nevertheless, the spectral ranges line up well with features of moment ratios, indicating that although the moments are not exhibiting unambiguous scaling, the behavior of the probability distribution functions is changing over short ranges of length scales. Implications for understanding weakly forced 2D turbulence with simultaneous inverse and direct cascades are discussed.
Inhomogeneous two-dimensional photonic media: A statistical study
NASA Astrophysics Data System (ADS)
Bellingeri, M.; Tenca, E.; Scotognella, F.
2012-10-01
Photonic media, in which disorder is introduced, are interesting materials for light management. In this paper, we have performed a statistical study of the average light transmission, over the range of wavelengths 450-1400 nm, for two-dimensional photonic structures with different homogeneity (quantified by the Shannon index). The photonic structure is a square lattice of circular pillars and the homogeneity is varied by clustering pillars in the crystal unit cells. We have calculated the light transmission for 50 different crystal realizations (permutating cluster position in the crystal) for each Shannon index value. Such Monte Carlo Markov Chain method produced the "a posteriori" distribution of the light transmission. We have observed a linear trend of the average transmission as a function of the crystal homogeneity. Furthermore, we have found a linear dependence of the average light transmission on the mean distance between pillars in the photonic structures.
Conn, Paul B.; Johnson, Devin S.; Ver Hoef, Jay M.; Hooten, Mevin B.; London, Joshua M.; Boveng, Peter L.
2015-01-01
Ecologists often fit models to survey data to estimate and explain variation in animal abundance. Such models typically require that animal density remains constant across the landscape where sampling is being conducted, a potentially problematic assumption for animals inhabiting dynamic landscapes or otherwise exhibiting considerable spatiotemporal variation in density. We review several concepts from the burgeoning literature on spatiotemporal statistical models, including the nature of the temporal structure (i.e., descriptive or dynamical) and strategies for dimension reduction to promote computational tractability. We also review several features as they specifically relate to abundance estimation, including boundary conditions, population closure, choice of link function, and extrapolation of predicted relationships to unsampled areas. We then compare a suite of novel and existing spatiotemporal hierarchical models for animal count data that permit animal density to vary over space and time, including formulations motivated by resource selection and allowing for closed populations. We gauge the relative performance (bias, precision, computational demands) of alternative spatiotemporal models when confronted with simulated and real data sets from dynamic animal populations. For the latter, we analyze spotted seal (Phoca largha) counts from an aerial survey of the Bering Sea where the quantity and quality of suitable habitat (sea ice) changed dramatically while surveys were being conducted. Simulation analyses suggested that multiple types of spatiotemporal models provide reasonable inference (low positive bias, high precision) about animal abundance, but have potential for overestimating precision. Analysis of spotted seal data indicated that several model formulations, including those based on a log-Gaussian Cox process, had a tendency to overestimate abundance. By contrast, a model that included a population closure assumption and a scale prior on total
Brannigan, V.M.; Bier, V.M.; Berg, C.
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily that other product liability cases on indirect or statistical proof of injury in toxic cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general. 23 refs.
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Hasan, A; Maloney, C E
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω(2)∼q regime at low q where ω(2) is the energy associated with the given mode and q is its wave number. The ω(2)∼q scaling should be expected to give rise to an anomalous DOS, D(ω), at low ω: D(ω)∼ω(3) rather than the conventional Debye result: D(ω)∼ω(2). The DOS for (100) looks to be consistent with D(ω)∼ω(3), while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias D(ω), giving a behavior closer to D(ω)∼ω(2) than D(ω)∼ω(3). These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
NASA Astrophysics Data System (ADS)
Hasan, A.; Maloney, C. E.
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω2˜q regime at low q where ω2 is the energy associated with the given mode and q is its wave number. The ω2˜q scaling should be expected to give rise to an anomalous DOS, Dω, at low ω : Dω˜ω3 rather than the conventional Debye result: Dω˜ω2 . The DOS for (100) looks to be consistent with Dω˜ω3 , while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias Dω, giving a behavior closer to Dω˜ω2 than Dω˜ω3 . These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
Convertino, Matteo; Mangoubi, Rami S.; Linkov, Igor; Lowry, Nathan C.; Desai, Mukund
2012-01-01
Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. Conclusions/Significance Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data. PMID:23115629
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Suarez-Diez, Maria; Saccenti, Edoardo
2015-12-01
We investigated the effect of sample size and dimensionality on the performance of four algorithms (ARACNE, CLR, CORR, and PCLRC) when they are used for the inference of metabolite association networks. We report that as many as 100-400 samples may be necessary to obtain stable network estimations, depending on the algorithm and the number of measured metabolites. The CLR and PCLRC methods produce similar results, whereas network inference based on correlations provides sparse networks; we found ARACNE to be unsuitable for this application, being unable to recover the underlying metabolite association network. We recommend the PCLRC algorithm for the inference on metabolite association networks.
NASA Astrophysics Data System (ADS)
Saatchi, R.
2004-03-01
The aim of the study was to automate the identification of a saccade-related visual evoked potential (EP) called the lambda wave. The lambda waves were extracted from single trials of electroencephalogram (EEG) waveforms using independent component analysis (ICA). A trial was a set of EEG waveforms recorded from 64 scalp electrode locations while a saccade was performed. Forty saccade-related EEG trials (recorded from four normal subjects) were used in the study. The number of waveforms per trial was reduced from 64 to 22 by pre-processing. The application of ICA to the resulting waveforms produced 880 components (i.e. 4 subjects × 10 trials per subject × 22 components per trial). The components were divided into 373 lambda and 507 nonlambda waves by visual inspection and then they were represented by one spatial and two temporal features. The classification performance of a Bayesian approach called predictive statistical diagnosis (PSD) was compared with that of a fuzzy logic approach called a fuzzy inference system (FIS). The outputs from the two classification approaches were then combined and the resulting discrimination accuracy was evaluated. For each approach, half the data from the lambda and nonlambda wave categories were used to determine the operating parameters of the classification schemes while the rest (i.e. the validation set) were used to evaluate their classification accuracies. The sensitivity and specificity values when the classification approaches were applied to the lambda wave validation data set were as follows: for the PSD 92.51% and 91.73% respectively, for the FIS 95.72% and 89.76% respectively, and for the combined FIS and PSD approach 97.33% and 97.24% respectively (classification threshold was 0.5). The devised signal processing techniques together with the classification approaches provided for an effective extraction and classification of the single-trial lambda waves. However, as only four subjects were included, it will be
NASA Astrophysics Data System (ADS)
Tyagi, Payal; Marruzzo, Alessia; Pagnani, Andrea; Antenucci, Fabrizio; Leuzzi, Luca
2016-07-01
We implement a pseudolikelihood approach with l1 and l2 regularizations as well as the recently introduced pseudolikelihood with decimation procedure to the inverse problem in continuous spin models on arbitrary networks, with arbitrarily disordered couplings. Performances of the approaches are tested against data produced by Monte Carlo numerical simulations and compared also to previously studied fully connected mean-field-based inference techniques. The results clearly show that the best network reconstruction is obtained through the decimation scheme, which also allows us to make the inference down to lower temperature regimes. Possible applications to phasor models for light propagation in random media are proposed and discussed.
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
From a Logical Point of View: An Illuminating Perspective in Teaching Statistical Inference
ERIC Educational Resources Information Center
Sowey, Eric R
2005-01-01
Offering perspectives in the teaching of statistics assists students, immersed in the study of detail, to see the leading principles of the subject more clearly. Especially helpful can be a perspective on the logic of statistical inductive reasoning. Such a perspective can bring to prominence a broad principle on which both interval estimation and…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
Three dimensional graphics in the statistical analysis of scientific data
Grotch, S.L.
1986-05-01
In scientific data analysis, the two-dimensional plot has become an indispensable tool. As the scientist more commonly encounters multivariate data, three dimensional graphics will form the natural extension of these more traditional representations. There can be little doubt that as the accessibility to ever more powerful graphics tools increases, their use will expand dramatically. In using three dimensional graphics in routine data analysis for nearly a decade, they have proved to be a powerful means for obtaining insights into data simply not available with traditional 2D methods. Examples of this work, taken primarily from chemistry and meteorology, are presented to illustrate a variety of 3D graphics found to be practically useful. Some approaches for improving these presentations are also highlighted.
Nonequilibrium statistical mechanics in one-dimensional bose gases
NASA Astrophysics Data System (ADS)
Baldovin, F.; Cappellaro, A.; Orlandini, E.; Salasnich, L.
2016-06-01
We study cold dilute gases made of bosonic atoms, showing that in the mean-field one-dimensional regime they support stable out-of-equilibrium states. Starting from the 3D Boltzmann-Vlasov equation with contact interaction, we derive an effective 1D Landau-Vlasov equation under the condition of a strong transverse harmonic confinement. We investigate the existence of out-of-equilibrium states, obtaining stability criteria similar to those of classical plasmas.
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition
Tang, Qin; Shi, Mijuan; Cheng, Yingyin; Zhang, Wanting; Xia, Xiao-Qin
2015-01-01
Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts. PMID:26607834
NASA Astrophysics Data System (ADS)
Fu, Ji-Meng; Winchester, John W.
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida - the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop) - is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturated in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model describes the river water as mainly a mixture of components with compositions resembling fresh rain, aged rain, and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO 3-/SO 42- ratio, signifying an atmospheric source but with most of its nitrate having been lost or transformed. The groundwater component, inferred from its concentration to contribute on average about one fourth of the river water, contains abundant Ca 2+ but no detectable nitrogen. Results similar to ACF were obtained for Sop and Och, though Och exhibits some association of NO 3- with the Ca 2+-rich component. Similar APCA of wet precipitation resolves mainly components that represent acid rain, with NO 3-, SO 42- and NH 4+ and sea salt, with Na +, Cl - and Mg 2+. Inland, the acid rain component is relatively more prominent and Cl - is depleted, while at atmospheric monitoring sites nearer the coastal region sea salt tends to be more prominent.
Statistical Inference for Valued-Edge Networks: The Generalized Exponential Random Graph Model
Desmarais, Bruce A.; Cranmer, Skyler J.
2012-01-01
Across the sciences, the statistical analysis of networks is central to the production of knowledge on relational phenomena. Because of their ability to model the structural generation of networks based on both endogenous and exogenous factors, exponential random graph models are a ubiquitous means of analysis. However, they are limited by an inability to model networks with valued edges. We address this problem by introducing a class of generalized exponential random graph models capable of modeling networks whose edges have continuous values (bounded or unbounded), thus greatly expanding the scope of networks applied researchers can subject to statistical analysis. PMID:22276151
Using Stimulus Equivalence Technology to Teach Statistical Inference in a Group Setting
ERIC Educational Resources Information Center
Critchfield, Thomas S.; Fienup, Daniel M.
2010-01-01
Computerized lessons employing stimulus equivalence technology, used previously under laboratory conditions to teach inferential statistics concepts to college students, were employed in a group setting for the first time. Students showed the same directly taught and emergent learning gains as in laboratory studies. A brief paper-and-pencil…
Statistical inference of selection and divergence of rice blast resistance gene Pi-ta
Technology Transfer Automated Retrieval System (TEKTRAN)
The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...
Parafermion braid statistics in quasi-one-dimensional networks
NASA Astrophysics Data System (ADS)
Clarke, David; Alicea, Jason; Shtengel, Kirill
2012-02-01
One dimensional systems with Majorana zero modes at phase boundaries may be thought of as physical realizations of a discrete quantum wire model first put forth by Kitaev [1]. Proposed methods for braiding such Majorana fermions in one-dimensional wire networks [2] have greatly expanded the set of plausible avenues toward topological quantum computation. Recently, a generalization of the Kitaev model to parafermion modes has been developed.[3] Here, we describe the transport of such parafermion modes along the chain by the adiabatic transformation of the Hamiltonian, analogous to the transport of Majorana fermion modes. We determine the (braid) transformations of the ground state sector allowed by the adiabatic exchange of the parafermion modes in wire networks. We show that, as with Majorana fermions, none of the parafermion braid sets are universal for quantum computation. Certain parafermion chain models, unlike Majorana fermion systems, become universal with the addition of measurement operations. We discuss possible physical realizations of the parafermion models. [4pt] [1] J Alicea et al., Nature Physics 7, 412-417 (2011) [0pt] [2] A. Kitaev, arXiv:cond-mat/0010440v2 [0pt] [3] P. Fendley, unpublished
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
A Meta-View of Multivariate Statistical Inference Methods in European Psychology Journals.
Harlow, Lisa L; Korendijk, Elly; Hamaker, Ellen L; Hox, Joop; Duerr, Sunny R
2013-09-01
We investigated the extent and nature of multivariate statistical inferential procedures used in eight European psychology journals covering a range of content (i.e., clinical, social, health, personality, organizational, developmental, educational, and cognitive). Multivariate methods included those found in popular texts that focused on prediction, group difference, and advanced modeling: multiple regression, logistic regression, analysis of covariance, multivariate analysis of variance, factor or principal component analysis, structural equation modeling, multilevel modeling, and other methods. Results revealed that an average of 57% of the articles from these eight journals involved multivariate analyses with a third using multiple regression, 17% using structural modeling, and the remaining methods collectively comprising about 50% of the analyses. The most frequently occurring inferential procedures involved prediction weights, dichotomous p values, figures with data, and significance tests with very few articles involving confidence intervals, statistical mediation, longitudinal analyses, power analysis, or meta-analysis. Contributions, limitations and future directions are discussed.
Fu, Ji-Meng; Winchester, J.W. )
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida-the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop)- is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturate in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model aged rain and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO[sup [minus][sub 3
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
NASA Astrophysics Data System (ADS)
Jha, Sanjeev Kumar; Comunian, Alessandro; Mariethoz, Gregoire; Kelly, Bryce F. J.
2014-10-01
We develop a stochastic approach to construct channelized 3-D geological models constrained to borehole measurements as well as geological interpretation. The methodology is based on simple 2-D geologist-provided sketches of fluvial depositional elements, which are extruded in the 3rd dimension. Multiple-point geostatistics (MPS) is used to impair horizontal variability to the structures by introducing geometrical transformation parameters. The sketches provided by the geologist are used as elementary training images, whose statistical information is expanded through randomized transformations. We demonstrate the applicability of the approach by applying it to modeling a fluvial valley filling sequence in the Maules Creek catchment, Australia. The facies models are constrained to borehole logs, spatial information borrowed from an analogue and local orientations derived from the present-day stream networks. The connectivity in the 3-D facies models is evaluated using statistical measures and transport simulations. Comparison with a statistically equivalent variogram-based model shows that our approach is more suited for building 3-D facies models that contain structures specific to the channelized environment and which have a significant influence on the transport processes.
Statistical properties of ideal three-dimensional magnetohydrodynamics
NASA Technical Reports Server (NTRS)
Stribling, T.; Matthaeus, W. H.
1990-01-01
Classical Gibbs ensemble methods are used to study the spectral structure of three-dimensional ideal MHD in periodic geometry. In this paper the equilibrium ensemble incorporates constraints of total energy, magnetic helicity, and cross helicity. Several new results are proven for ensemble averages, including the constraint that magnetic energy equal or exceed kinetic energy, and that cross helicity represents a constant fraction of magnetic energy across the spectral domain, for arbitrary size systems. Two zero-temperature limits are considered in detail, emphasizing the role of complete and partial condensaiton of spectral quantities to the longest wavelength states. The ensemble predictions are compared to direct numerical solution using a low-order truncation Galerkin spectral code. Implications for spectral transfer of nonequilibrium, dissipative turbulent MHD systems are discussed.
Fragmentation and exfoliation of 2-dimensional materials: a statistical approach.
Kouroupis-Agalou, Konstantinos; Liscio, Andrea; Treossi, Emanuele; Ortolani, Luca; Morandi, Vittorio; Pugno, Nicola Maria; Palermo, Vincenzo
2014-06-01
The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used different statistical functions to model the asymmetric distribution of nanosheet sizes typically obtained. Being the resolution of AFM much larger than the average sheet size, analysis could be performed directly at the nanoscale and at the single sheet level. We find that the size distribution of the sheets at a given time follows a log-normal distribution, indicating that the exfoliation process has a "typical" scale length that changes with time and that exfoliation proceeds through the formation of a distribution of random cracks that follow Poisson statistics. The validity of this model implies that the size distribution does not depend on the different preparation methods used, but is a common feature in the exfoliation of this material and thus probably for other 2D materials.
Fragmentation and exfoliation of 2-dimensional materials: a statistical approach
NASA Astrophysics Data System (ADS)
Kouroupis-Agalou, Konstantinos; Liscio, Andrea; Treossi, Emanuele; Ortolani, Luca; Morandi, Vittorio; Pugno, Nicola Maria; Palermo, Vincenzo
2014-05-01
The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used different statistical functions to model the asymmetric distribution of nanosheet sizes typically obtained. Being the resolution of AFM much larger than the average sheet size, analysis could be performed directly at the nanoscale and at the single sheet level. We find that the size distribution of the sheets at a given time follows a log-normal distribution, indicating that the exfoliation process has a ``typical'' scale length that changes with time and that exfoliation proceeds through the formation of a distribution of random cracks that follow Poisson statistics. The validity of this model implies that the size distribution does not depend on the different preparation methods used, but is a common feature in the exfoliation of this material and thus probably for other 2D materials.The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used
ERIC Educational Resources Information Center
Cui, Ying; Roberts, Mary Roduta
2013-01-01
The goal of this study was to investigate the usefulness of person-fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two-stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person-fit statistic, the…
Soap film flows: Statistics of two-dimensional turbulence
Vorobieff, P.; Rivera, M.; Ecke, R.E.
1999-08-01
Soap film flows provide a very convenient laboratory model for studies of two-dimensional (2-D) hydrodynamics including turbulence. For a gravity-driven soap film channel with a grid of equally spaced cylinders inserted in the flow, we have measured the simultaneous velocity and thickness fields in the irregular flow downstream from the cylinders. The velocity field is determined by a modified digital particle image velocimetry method and the thickness from the light scattered by the particles in the film. From these measurements, we compute the decay of mean energy, enstrophy, and thickness fluctuations with downstream distance, and the structure functions of velocity, vorticity, thickness fluctuation, and vorticity flux. From these quantities we determine the microscale Reynolds number of the flow R{sub {lambda}}{approx}100 and the integral and dissipation scales of 2D turbulence. We also obtain quantitative measures of the degree to which our flow can be considered incompressible and isotropic as a function of downstream distance. We find coarsening of characteristic spatial scales, qualitative correspondence of the decay of energy and enstrophy with the Batchelor model, scaling of energy in {ital k} space consistent with the k{sup {minus}3} spectrum of the Kraichnan{endash}Batchelor enstrophy-scaling picture, and power-law scalings of the structure functions of velocity, vorticity, vorticity flux, and thickness. These results are compared with models of 2-D turbulence and with numerical simulations. {copyright} {ital 1999 American Institute of Physics.}
Anderson, Eric C
2012-01-01
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs. PMID:23152426
Anderson, Eric C
2012-11-08
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.
NASA Astrophysics Data System (ADS)
Kalemis, A.; Bailey, D. L.; Flower, M. A.; Lord, S. K.; Ott, R. J.
2004-07-01
In this paper two tests based on statistical models are presented and used to assess, quantify and provide positional information of the existence of bias and/or variations between planar images acquired at different times but under similar conditions. In the first test a linear regression model is fitted to the data in a pixelwise fashion, using three mathematical operators. In the second test a comparison using z-scoring is used based on the assumption that Poisson statistics are valid. For both tests the underlying assumptions are as simple and few as possible. The results are presented as parametric maps of either the three operators or the z-score. The z-score maps can then be thresholded to show the parts of the images which demonstrate change. Three different thresholding methods (naïve, adaptive and multiple) are presented: together they cover almost all the needs for separating the signal from the background in the z-score maps. Where the expected size of the signal is known or can be estimated, a spatial correction technique (referred to as the reef correction) can be applied. These tests were applied to flood images used for the quality control of gamma camera uniformity. Simulated data were used to check the validity of the methods. Real data were acquired from four different cameras from two different institutions using a variety of acquisition parameters. The regression model found the bias in all five simulated cases and it also found patterns of unstable regions in real data where visual inspection of the flood images did not show any problems. In comparison the z-map revealed the differences in the simulated images from as low as 1.8 standard deviations from the mean, corresponding to a differential uniformity of 2.2% over the central field of view. In all cases studied, the reef correction increased significantly the sensitivity of the method and in most cases the specificity as well. The two proposed tests can be used either separately or in
Statistical Inference in Hidden Markov Models Using k-Segment Constraints
Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher
2016-01-01
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674
NASA Astrophysics Data System (ADS)
Laloy, Eric; Linde, Niklas; Jacques, Diederik; Vrugt, Jasper A.
2015-06-01
We present a Bayesian inversion method for the joint inference of high-dimensional multi-Gaussian hydraulic conductivity fields and associated geostatistical parameters from indirect hydrological data. We combine Gaussian process generation via circulant embedding to decouple the variogram from grid cell specific values, with dimensionality reduction by interpolation to enable Markov chain Monte Carlo (MCMC) simulation. Using the Matérn variogram model, this formulation allows inferring the conductivity values simultaneously with the field smoothness (also called Matérn shape parameter) and other geostatistical parameters such as the mean, sill, integral scales and anisotropy direction(s) and ratio(s). The proposed dimensionality reduction method systematically honors the underlying variogram and is demonstrated to achieve better performance than the Karhunen-Loève expansion. We illustrate our inversion approach using synthetic (error corrupted) data from a tracer experiment in a fairly heterogeneous 10,000-dimensional 2-D conductivity field. A 40-times reduction of the size of the parameter space did not prevent the posterior simulations to appropriately fit the measurement data and the posterior parameter distributions to include the true geostatistical parameter values. Overall, the posterior field realizations covered a wide range of geostatistical models, questioning the common practice of assuming a fixed variogram prior to inference of the hydraulic conductivity values. Our method is shown to be more efficient than sequential Gibbs sampling (SGS) for the considered case study, particularly when implemented on a distributed computing cluster. It is also found to outperform the method of anchored distributions (MAD) for the same computational budget.
Statistical inference of co-movements of stocks during a financial crisis
NASA Astrophysics Data System (ADS)
Ibuki, Takero; Higano, Shunsuke; Suzuki, Sei; Inoue, Jun-ichi; Chakraborti, Anirban
2013-12-01
In order to figure out and to forecast the emergence phenomena of social systems, we propose several probabilistic models for the analysis of financial markets, especially around a crisis. We first attempt to visualize the collective behaviour of markets during a financial crisis through cross-correlations between typical Japanese daily stocks by making use of multidimensional scaling. We find that all the two-dimensional points (stocks) shrink into a single small region when a economic crisis takes place. By using the properties of cross-correlations in financial markets especially during a crisis, we next propose a theoretical framework to predict several time-series simultaneously. Our model system is basically described by a variant of the multi-layered Ising model with random fields as non-stationary time series. Hyper-parameters appearing in the probabilistic model are estimated by means of minimizing the 'cumulative error' in the past market history. The justification and validity of our approaches are numerically examined for several empirical data sets.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Calderon, Christopher P; Weiss, Lucien E; Moerner, W E
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x,y (or x,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium). PMID:25353827
Statistically Based Inference of Physical Rock Properties of Main Rock Types in Germany
NASA Astrophysics Data System (ADS)
Koch, A.; Jorand, R.; Clauser, C.
2009-12-01
A major obstacle for an increased use of geothermal energy often lies in the high success risk for the development of geothermal reservoirs due to the unknown rock properties. In general, the ranges of thermal and hydraulic properties (thermal conductivity, volumetric heat capacity, porosity, permeability) in existing compilations of rock properties are too large to be useful to constrain properties for specific sites. Usually, conservative assumptions are made about these properties, resulting in greater drilling depth and increased exploration cost. In this study, data from direct measurements on more than 600 core samples from different borehole locations and depths enable to derive statistical moments of the desired properties for selected main rock types in the German subsurface. Using modern core scanning technology allowed measuring rapidly thermal conductivity, sonic velocity, and gamma density with high resolution on a large number of samples. In addition, we measured porosity, bulk density, and matrix density based on Archimedes’ principle and pycnometer analysis. Tests on a smaller collection of samples also include specific heat capacity, hydraulic permeability, and radiogenic heat production rate. In addition, we complemented the petrophysical measurements by quantitative mineralogical analysis. The results reveal that even for the same main rock type the results differ significantly depending on geologic age, origin, compaction, and mineralogical composition. For example, water saturated thermal conductivity of tight Palaeozoic sandstones from the Lower Rhine Embayment and the Ruhr Area is 4.0±0.7 W m-1 K-1 and 4.6±0.6 W m-1 K-1, respectively, which is nearly identical to values for the Lower Triassic Bunter sandstone in Southwest-Germany (high in quartz showing an average value of 4.3±0.4 W m-1 K-1). In contrast, saturated thermal conductivity of Upper Triassic sandstone in the same area is considerably lower at 2.5±0.1 W m-1 K-1 (Schilf
Menon, Ravishankar; Gerstoft, Peter; Hodgkiss, William S
2012-11-01
Cross-correlations of diffuse noise fields can be used to extract environmental information. The influence of directional sources (usually ships) often results in a bias of the travel time estimates obtained from the cross-correlations. Using an array of sensors, insights from random matrix theory on the behavior of the eigenvalues of the sample covariance matrix (SCM) in an isotropic noise field are used to isolate the diffuse noise component from the directional sources. A sequential hypothesis testing of the eigenvalues of the SCM reveals eigenvalues dominated by loud sources that are statistical outliers for the assumed diffuse noise model. Travel times obtained from cross-correlations using only the diffuse noise component (i.e., by discarding or attenuating the outliers) converge to the predicted travel times based on the known array sensor spacing and measured sound speed at the site and are stable temporally (i.e., unbiased estimates). Data from the Shallow Water 2006 experiment demonstrates the effectiveness of this approach and that the signal-to-noise ratio builds up as the square root of time, as predicted by theory.
Statistical inference from multiple iTRAQ experiments without using common reference standards.
Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo
2013-02-01
Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.
Statistical inference methods for recurrent event processes with shape and size parameters
WANG, MEI-CHENG; HUANG, CHIUNG-YU
2015-01-01
Summary This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation. PMID:26412863
Can we infer the effect of river works on streamflow statistics?
NASA Astrophysics Data System (ADS)
Ganora, Daniele
2016-04-01
Most of our river network system is affected by anthropic pressure of different types. While climate and land use change are widely recognized as important factors, the effects of "in-line" water infrastructures on the global behavior of the river system is often overlooked. This is due to the difficulty in including local "physical" knowledge (e.g., the hydraulic behavior of a river reach with levees during a flood) into large-scale models that provide a statistical description of the streamflow, and which are the basis for the implementation of resources/risk management plans (e.g., regional models for prediction of the flood frequency curve). This work presents some preliminary applications regarding two widely used hydrological signatures, the flow duration curve and the flood frequency curve. We adopt a pragmatic (i.e., reliable and implementable at large scales) and parsimonious (i.e., that requires a few data) framework of analysis, considering that we operate in a complex system (many river work are already existing, and many others could be built in the future). In the first case, a method is proposed to correct observations of streamflow affected by the presence of upstream run-of-the-river power plants in order to provide the "natural" flow duration curve, using only simple information about the plant (i.e., the maximum intake flow). The second case regards the effects of flood-protection works on the downstream sections, to support the application of along-stream cost-benefit analysis in the flood risk management context. Current applications and possible future developments are discussed.
Statistical reconstruction of three-dimensional porous media from two-dimensional images
NASA Astrophysics Data System (ADS)
Roberts, Anthony P.
1997-09-01
A method of modeling the three-dimensional microstructure of random isotropic two-phase materials is proposed. The information required to implement the technique can be obtained from two-dimensional images of the microstructure. The reconstructed models share two-point correlation and chord-distribution functions with the original composite. The method is designed to produce models for computationally and theoretically predicting the effective macroscopic properties of random materials (such as electrical and thermal conductivity, permeability and elastic moduli). To test the method we reconstruct the morphology and predict the conductivity of the well known overlapping sphere model. The results are in very good agreement with data for the original model.
Schimek, Michael G; Budinská, Eva; Kugler, Karl G; Švendová, Vendula; Ding, Jie; Lin, Shili
2015-06-01
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
NASA Astrophysics Data System (ADS)
von Nessi, G. T.; Hole, M. J.; The MAST Team
2014-11-01
We present recent results and technical breakthroughs for the Bayesian inference of tokamak equilibria using force-balance as a prior constraint. Issues surrounding model parameter representation and posterior analysis are discussed and addressed. These points motivate the recent advancements embodied in the Bayesian Equilibrium Analysis and Simulation Tool (BEAST) software being presently utilized to study equilibria on the Mega-Ampere Spherical Tokamak (MAST) experiment in the UK (von Nessi et al 2012 J. Phys. A 46 185501). State-of-the-art results of using BEAST to study MAST equilibria are reviewed, with recent code advancements being systematically presented though out the manuscript.
A three-dimensional statistical mechanical model of folding double-stranded chain molecules
NASA Astrophysics Data System (ADS)
Zhang, Wenbing; Chen, Shi-Jie
2001-05-01
Based on a graphical representation of intrachain contacts, we have developed a new three-dimensional model for the statistical mechanics of double-stranded chain molecules. The theory has been tested and validated for the cubic lattice chain conformations. The statistical mechanical model can be applied to the equilibrium folding thermodynamics of a large class of chain molecules, including protein β-hairpin conformations and RNA secondary structures. The application of a previously developed two-dimensional model to RNA secondary structure folding thermodynamics generally overestimates the breadth of the melting curves [S-J. Chen and K. A. Dill, Proc. Natl. Acad. Sci. U.S.A. 97, 646 (2000)], suggesting an underestimation for the sharpness of the conformational transitions. In this work, we show that the new three-dimensional model gives much sharper melting curves than the two-dimensional model. We believe that the new three-dimensional model may give much improved predictions for the thermodynamic properties of RNA conformational changes than the previous two-dimensional model.
Braiding statistics and classification of two-dimensional charge-2 m superconductors
NASA Astrophysics Data System (ADS)
Wang, Chenjie
2016-08-01
We study braiding statistics between quasiparticles and vortices in two-dimensional charge-2 m (in units of e ) superconductors that are coupled to a Z2 m dynamical gauge field, where m is any positive integer. We show that there exist 16 m types of braiding statistics when m is odd, but only 4 m types when m is even. Based on the braiding statistics, we obtain a classification of topological phases of charge-2 m superconductors—or formally speaking, a classification of symmetry-protected topological phases, as well as invertible topological phases, of two-dimensional gapped fermions with Z2m f symmetry. Interestingly, we find that there is no nontrivial fermionic symmetry-protected topological phase with Z4f symmetry.
Stadler, R.; Hellmann, J.; Schirle, M.; Beckmann, J.
1993-12-31
Based on on previous work where it was shown that 4-urazoyl benzoic acid groups (U4A), which were statistically attached to polybutadiene, form ordered supramolecular arrays in the polymer matrix. The present work describes the synthesis of a new molecular building block capable for self assembling in the unpolar matrix. 5-urazoylisophthalic acid groups (U35A) attached to 1,4-polybutadiene chains show an endothermic transition, characteristic for supramolecular self assembling. The melting temperature increases for low levels of modification from 130{degrees}C up to 190{degrees}C. The IR-data indicate than the 5-urazoylisophthalic acid groups are 4-functional with respect to supramolecular self-addressing. Based on the detailed knowledge of the structure of the self-assembled domains in 4-urazoyl benzoic acid groups, a model is developed which describes qualitatively the observed material properties.
NASA Astrophysics Data System (ADS)
Balzani, D.; Scheunemann, L.; Brands, D.; Schröder, J.
2014-11-01
In this paper a method is presented for the construction of two- and three-dimensional statistically similar representative volume elements (SSRVEs) that may be used in computational two-scale calculations. These SSRVEs are obtained by minimizing a least-square functional defined in terms of deviations of statistical measures describing the microstructure morphology and mechanical macroscopic quantities computed for a random target microstructure and for the SSRVE. It is shown that such SSRVEs serve as lower bounds in a statistical sense with respect to the difference of microstructure morphology. Moreover, an upper bound is defined by the maximum of the least-square functional. A staggered optimization procedure is proposed enabling a more efficient construction of SSRVEs. In an inner optimization problem we ensure that the statistical similarity of the microstructure morphology in the SSRVE compared with a target microstructure is as high as possible. Then, in an outer optimization problem we analyze mechanical stress-strain curves. As an example for the proposed method two- and three-dimensional SSRVEs are constructed for real microstructure data of a dual-phase steel. By comparing their mechanical response with the one of the real microstructure the performance of the method is documented. It turns out that the quality of the SSRVEs improves and converges to some limit value as the microstructure complexity of the SSRVE increases. This converging behavior gives reason to expect an optimal SSRVE at the limit for a chosen type of microstructure parameterization and set of statistical measures.
Noise and counting statistics of insulating phases in one-dimensional optical lattices
Lamacraft, Austen
2007-07-15
We discuss the correlation properties of current-carrying states of one-dimensional insulators, which could be realized by applying an impulse to atoms loaded onto an optical lattice. While the equilibrium noise has a gapped spectrum, the quantum uncertainty encoded in the amplitudes for the Zener process gives a zero-frequency contribution out of equilibrium. We derive a general expression for the generating function of the full counting statistics and find that the particle transport obeys binomial statistics with doubled charge, resulting in super-Poissonian noise that originates from the coherent creation of particle-hole pairs.
Schwermann, Achim H; dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-01-01
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods. DOI: http://dx.doi.org/10.7554/eLife.12129.001 PMID:26854367
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images
Hu, Qin; Victor, Jonathan D.
2016-01-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study – largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank. PMID:27713838
NASA Astrophysics Data System (ADS)
Qin, Fang; Wen, Wen; Chen, Ji-Sheng
2014-07-01
The thermal and electrical transport properties of an ideal anyon gas within fractional exclusion statistics are studied. By solving the Boltzmann equation with the relaxation-time approximation, the analytical expressions for the thermal and electrical conductivities of a three-dimensional ideal anyon gas are given. The low-temperature expressions for the two conductivities are obtained by using the Sommerfeld expansion. It is found that the Wiedemann—Franz law should be modified by the higher-order temperature terms, which depend on the statistical parameter g for a charged anyon gas. Neglecting the higher-order terms of temperature, the Wiedemann—Franz law is respected, which gives the Lorenz number. The Lorenz number is a function of the statistical parameter g.
Austin, Peter C
2011-05-20
Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring.
NASA Astrophysics Data System (ADS)
Yoshimitsu, Nana; Furumura, Takashi; Maeda, Takuto
2016-09-01
The coda part of a waveform transmitted through a laboratory sample should be examined for the high-resolution monitoring of the sample characteristics in detail. However, the origin and propagation process of the later phases in a finite-sized small sample are very complicated with the overlap of multiple unknown reflections and conversions. In this study, we investigated the three-dimensional (3D) geometric effect of a finite-sized cylindrical sample to understand the development of these later phases. This study used 3D finite difference method simulation employing a free-surface boundary condition over a curved model surface and a realistic circular shape of the source model. The simulated waveforms and the visualized 3D wavefield in a stainless steel sample clearly demonstrated the process of multiple reflections and the conversions of the P and S waves at the side surface as well as at the top and bottom of the sample. Rayleigh wave propagation along the curved side boundary was also confirmed, and these waves dominate in the later portion of the simulated waveform with much larger amplitudes than the P and S wave reflections. The feature of the simulated waveforms showed good agreement with laboratory observed waveforms. For the simulation, an introduction of an absorbing boundary condition at the top and bottom of the sample made it possible to efficiently separate the contribution of the vertical and horizontal boundary effects in the simulated wavefield. This procedure helped to confirm the additional finding of vertically propagating multiple surface waves and their conversion at the corner of the sample. This new laboratory-scale 3D simulation enabled the appearance of a variety of geometric effects that constitute the later phases of the transmitted waves.
Interoccurrence time statistics in the two-dimensional Burridge-Knopoff earthquake model
Hasumi, Tomohiro
2007-08-15
We have numerically investigated statistical properties of the so-called interoccurrence time or the waiting time, i.e., the time interval between successive earthquakes, based on the two-dimensional (2D) spring-block (Burridge-Knopoff) model, selecting the velocity-weakening property as the constitutive friction law. The statistical properties of frequency distribution and the cumulative distribution of the interoccurrence time are discussed by tuning the dynamical parameters, namely, a stiffness and frictional property of a fault. We optimize these model parameters to reproduce the interoccurrence time statistics in nature; the frequency and cumulative distribution can be described by the power law and Zipf-Mandelbrot type power law, respectively. In an optimal case, the b value of the Gutenberg-Richter law and the ratio of wave propagation velocity are in agreement with those derived from real earthquakes. As the threshold of magnitude is increased, the interoccurrence time distribution tends to follow an exponential distribution. Hence it is suggested that a temporal sequence of earthquakes, aside from small-magnitude events, is a Poisson process, which is observed in nature. We found that the interoccurrence time statistics derived from the 2D BK (original) model can efficiently reproduce that of real earthquakes, so that the model can be recognized as a realistic one in view of interoccurrence time statistics.
Three-Dimensional Statistical Gas Distribution Mapping in an Uncontrolled Indoor Environment
Reggente, Matteo; Lilienthal, Achim J.
2009-05-23
In this paper we present a statistical method to build three-dimensional gas distribution maps (3D-DM). The proposed mapping technique uses kernel extrapolation with a tri-variate Gaussian kernel that models the likelihood that a reading represents the concentration distribution at a distant location in the three dimensions. The method is evaluated using a mobile robot equipped with three 'e-noses' mounted at different heights. Initial experiments in an uncontrolled indoor environment are presented and evaluated with respect to the ability of the 3D map, computed from the lower and upper nose, to predict the map from the middle nose.
Nonextensive statistics, entropic gravity and gravitational force in a non-integer dimensional space
NASA Astrophysics Data System (ADS)
Abreu, Everton M. C.; Neto, Jorge Ananias; Godinho, Cresus F. L.
2014-10-01
Based on the connection between Tsallis nonextensive statistics and fractional dimensional space, in this work we have introduced, with the aid of Verlinde's formalism, the Newton constant in a fractal space as a function of the nonextensive constant. With this result we have constructed a curve that shows the direct relation between Tsallis nonextensive parameter and the dimension of this fractal space. We have demonstrated precisely that there are ambiguities between the results due to Verlinde's approach and the ones due to fractional calculus formalism. We have shown precisely that these ambiguities appear only for spaces with dimensions different from three. A possible solution for this ambiguity was proposed here.
Statistics of the inverse-cascade regime in two-dimensional magnetohydrodynamic turbulence.
Banerjee, Debarghya; Pandit, Rahul
2014-07-01
We present a detailed direct numerical simulation of statistically steady, homogeneous, isotropic, two-dimensional magnetohydrodynamic turbulence. Our study concentrates on the inverse cascade of the magnetic vector potential. We examine the dependence of the statistical properties of such turbulence on dissipation and friction coefficients. We extend earlier work significantly by calculating fluid and magnetic spectra, probability distribution functions (PDFs) of the velocity, magnetic, vorticity, current, stream-function, and magnetic-vector-potential fields, and their increments. We quantify the deviations of these PDFs from Gaussian ones by computing their flatnesses and hyperflatnesses. We also present PDFs of the Okubo-Weiss parameter, which distinguishes between vortical and extensional flow regions, and its magnetic analog. We show that the hyperflatnesses of PDFs of the increments of the stream function and the magnetic vector potential exhibit significant scale dependence and we examine the implication of this for the multiscaling of structure functions. We compare our results with those of earlier studies.
Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis
Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel
2016-01-01
An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different case studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
Kravtsov, V.E.; Yudson, V.I.
2011-07-15
Highlights: > Statistics of normalized eigenfunctions in one-dimensional Anderson localization at E = 0 is studied. > Moments of inverse participation ratio are calculated. > Equation for generating function is derived at E = 0. > An exact solution for generating function at E = 0 is obtained. > Relation of the generating function to the phase distribution function is established. - Abstract: The one-dimensional (1d) Anderson model (AM), i.e. a tight-binding chain with random uncorrelated on-site energies, has statistical anomalies at any rational point f=(2a)/({lambda}{sub E}) , where a is the lattice constant and {lambda}{sub E} is the de Broglie wavelength. We develop a regular approach to anomalous statistics of normalized eigenfunctions {psi}(r) at such commensurability points. The approach is based on an exact integral transfer-matrix equation for a generating function {Phi}{sub r}(u, {phi}) (u and {phi} have a meaning of the squared amplitude and phase of eigenfunctions, r is the position of the observation point). This generating function can be used to compute local statistics of eigenfunctions of 1d AM at any disorder and to address the problem of higher-order anomalies at f=p/q with q > 2. The descender of the generating function P{sub r}({phi}){identical_to}{Phi}{sub r}(u=0,{phi}) is shown to be the distribution function of phase which determines the Lyapunov exponent and the local density of states. In the leading order in the small disorder we derived a second-order partial differential equation for the r-independent ('zero-mode') component {Phi}(u, {phi}) at the E = 0 (f=1/2 ) anomaly. This equation is nonseparable in variables u and {phi}. Yet, we show that due to a hidden symmetry, it is integrable and we construct an exact solution for {Phi}(u, {phi}) explicitly in quadratures. Using this solution we computed moments I{sub m} = N< vertical bar {psi} vertical bar {sup 2m}> (m {>=} 1) for a chain of the length N {yields} {infinity} and found an
Statistical mechanics of two-dimensional foams: Physical foundations of the model.
Durand, Marc
2015-12-01
In a recent series of papers, a statistical model that accounts for correlations between topological and geometrical properties of a two-dimensional shuffled foam has been proposed and compared with experimental and numerical data. Here, the various assumptions on which the model is based are exposed and justified: the equiprobability hypothesis of the foam configurations is argued. The range of correlations between bubbles is discussed, and the mean-field approximation that is used in the model is detailed. The two self-consistency equations associated with this mean-field description can be interpreted as the conservation laws of number of sides and bubble curvature, respectively. Finally, the use of a "Grand-Canonical" description, in which the foam constitutes a reservoir of sides and curvature, is justified. PMID:26701712
NASA Astrophysics Data System (ADS)
Villeta, M.; Sanz-Lobera, A.; González, C.; Sebastián, M. A.
2009-11-01
The implantation of Statistical Process Control, SPC designated in short, requires the use of measurement systems. The inherent variability of these systems influences on the reliability of measurement results obtained, and as a consequence of it, influences on the SPC results. This paper investigates about the influence of the uncertainty of measurement on the analysis of process capability. It looks for reducing the effect of measurement uncertainty, to approach the capability that the productive process really has. In this work processes centered at a nominal value as well as off-center processes are raised, and a criterion is proposed that allows validate the adequacy of the dimensional measurement systems used in a SPC implantation.
Collisional statistics and dynamics of two-dimensional hard-disk systems: From fluid to solid.
Taloni, Alessandro; Meroz, Yasmine; Huerta, Adrián
2015-08-01
We perform extensive MD simulations of two-dimensional systems of hard disks, focusing on the collisional statistical properties. We analyze the distribution functions of velocity, free flight time, and free path length for packing fractions ranging from the fluid to the solid phase. The behaviors of the mean free flight time and path length between subsequent collisions are found to drastically change in the coexistence phase. We show that single-particle dynamical properties behave analogously in collisional and continuous-time representations, exhibiting apparent crossovers between the fluid and the solid phases. We find that, both in collisional and continuous-time representation, the mean-squared displacement, velocity autocorrelation functions, intermediate scattering functions, and self-part of the van Hove function (propagator) closely reproduce the same behavior exhibited by the corresponding quantities in granular media, colloids, and supercooled liquids close to the glass or jamming transition. PMID:26382368
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Statistical conservation law in two- and three-dimensional turbulent flows.
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013)] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d-dimensional flow the distance R(t) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012)], finding that although it probes small scales they are not in the smooth regime. Thus instead of 〈R-3〉, we look for a similar, power-law-in-separation conservation law. We show that the existence of an initially slowly varying function of this form can be predicted but that it does not turn into a conservation law. We suggest that the conservation of 〈R-d〉, demonstrated here, can be used as a check of isotropy, incompressibility, and flow dimensionality in numerical and laboratory experiments that focus on small scales.
Statistical conservation law in two- and three-dimensional turbulent flows
NASA Astrophysics Data System (ADS)
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013), 10.1103/PhysRevLett.110.214502] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d -dimensional flow the distance R (t ) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012), 10.1016/j.physd.2011.07.008], finding that although it probes small scales they are not in the smooth regime. Thus instead of
ERIC Educational Resources Information Center
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
Current Sheet Statistics in Three-Dimensional Simulations of Coronal Heating
NASA Astrophysics Data System (ADS)
Lin, L.; Ng, C. S.; Bhattacharjee, A.
2013-04-01
In a recent numerical study [Ng et al., Astrophys. J. 747, 109, 2012], with a three-dimensional model of coronal heating using reduced magnetohydrodynamics (RMHD), we have obtained scaling results of heating rate versus Lundquist number based on a series of runs in which random photospheric motions are imposed for hundreds to thousands of Alfvén time in order to obtain converged statistical values. The heating rate found in these simulations saturate to a level that is independent of the Lundquist number. This scaling result was also supported by an analysis with the assumption of the Sweet-Parker scaling of the current sheets, as well as how the width, length and number of current sheets scale with Lundquist number. In order to test these assumptions, we have implemented an automated routine to analyze thousands of current sheets in these simulations and return statistical scalings for these quantities. It is found that the Sweet-Parker scaling is justified. However, some discrepancies are also found and require further study.
Yoshimatsu, Katsunori; Kawahara, Yasuhiro; Schneider, Kai; Okamoto, Naoya; Farge, Marie
2011-09-15
Scale-dependent and geometrical statistics of three-dimensional incompressible homogeneous magnetohydrodynamic turbulence without mean magnetic field are examined by means of the orthogonal wavelet decomposition. The flow is computed by direct numerical simulation with a Fourier spectral method at resolution 512{sup 3} and a unit magnetic Prandtl number. Scale-dependent second and higher order statistics of the velocity and magnetic fields allow to quantify their intermittency in terms of spatial fluctuations of the energy spectra, the flatness, and the probability distribution functions at different scales. Different scale-dependent relative helicities, e.g., kinetic, cross, and magnetic relative helicities, yield geometrical information on alignment between the different scale-dependent fields. At each scale, the alignment between the velocity and magnetic field is found to be more pronounced than the other alignments considered here, i.e., the scale-dependent alignment between the velocity and vorticity, the scale-dependent alignment between the magnetic field and its vector potential, and the scale-dependent alignment between the magnetic field and the current density. Finally, statistical scale-dependent analyses of both Eulerian and Lagrangian accelerations and the corresponding time-derivatives of the magnetic field are performed. It is found that the Lagrangian acceleration does not exhibit substantially stronger intermittency compared to the Eulerian acceleration, in contrast to hydrodynamic turbulence where the Lagrangian acceleration shows much stronger intermittency than the Eulerian acceleration. The Eulerian time-derivative of the magnetic field is more intermittent than the Lagrangian time-derivative of the magnetic field.
Statistics of extreme waves in the framework of one-dimensional Nonlinear Schrodinger Equation
NASA Astrophysics Data System (ADS)
Agafontsev, Dmitry; Zakharov, Vladimir
2013-04-01
We examine the statistics of extreme waves for one-dimensional classical focusing Nonlinear Schrodinger (NLS) equation, iÎ¨t + Î¨xx + |Î¨ |2Î¨ = 0, (1) as well as the influence of the first nonlinear term beyond Eq. (1) - the six-wave interactions - on the statistics of waves in the framework of generalized NLS equation accounting for six-wave interactions, dumping (linear dissipation, two- and three-photon absorption) and pumping terms, We solve these equations numerically in the box with periodically boundary conditions starting from the initial data Î¨t=0 = F(x) + ?(x), where F(x) is an exact modulationally unstable solution of Eq. (1) seeded by stochastic noise ?(x) with fixed statistical properties. We examine two types of initial conditions F(x): (a) condensate state F(x) = 1 for Eq. (1)-(2) and (b) cnoidal wave for Eq. (1). The development of modulation instability in Eq. (1)-(2) leads to formation of one-dimensional wave turbulence. In the integrable case the turbulence is called integrable and relaxes to one of infinite possible stationary states. Addition of six-wave interactions term leads to appearance of collapses that eventually are regularized by the dumping terms. The energy lost during regularization of collapses in (2) is restored by the pumping term. In the latter case the system does not demonstrate relaxation-like behavior. We measure evolution of spectra Ik =< |Î¨k|2 >, spatial correlation functions and the PDFs for waves amplitudes |Î¨|, concentrating special attention on formation of "fat tails" on the PDFs. For the classical integrable NLS equation (1) with condensate initial condition we observe Rayleigh tails for extremely large waves and a "breathing region" for middle waves with oscillations of the frequency of waves appearance with time, while nonintegrable NLS equation with dumping and pumping terms (2) with the absence of six-wave interactions α = 0 demonstrates perfectly Rayleigh PDFs without any oscillations with
NASA Astrophysics Data System (ADS)
Hasan, Asad; Maloney, Craig
2013-03-01
We compute the effective dispersion and density of states (DOS) of two-dimensional sub-regions of three dimensional face centered cubic (FCC) crystals with both a direct projection-inversion technique and a Monte Carlo simulation based on a common Hamiltonian. We study sub-regions of both (111) and (100) planes. For any direction of wavevector, we show an anomalous ω2 ~ q scaling regime at low q where ω2 is the energy associated with a mode of wavenumber q. This scaling should give rise to an anomalous DOS, Dω, at low ω: Dω ~ω3 rather than the conventional Debye result: Dω ~ω2 . The DOS for the (100) sub-region looks to be consistent with Dω ~ω3 , while the (111) shows something closer to the Debye result at the smallest frequencies. Our Monte Carlo simulation shows that finite sampling artifacts act as an effective disorder and bias the Dω in the same way as the finite size artifacts, giving a behavior closer to Dω ~ω2 than Dω ~ω3 . These results should have an important impact on interpretation of recent studies of colloidal solids where two-point displacement correlations can be obtained in real-space via microscopy.
NASA Astrophysics Data System (ADS)
Das Sarma, S.; Nag, Amit; Sau, Jay D.
2016-07-01
We consider a simple conceptual question with respect to Majorana zero modes in semiconductor nanowires: can the measured nonideal values of the zero-bias-conductance-peak in the tunneling experiments be used as a characteristic to predict the underlying topological nature of the proximity induced nanowire superconductivity? In particular, we define and calculate the topological visibility, which is a variation of the topological invariant associated with the scattering matrix of the system as well as the zero-bias-conductance-peak heights in the tunneling measurements, in the presence of dissipative broadening, using precisely the same realistic nanowire parameters to connect the topological invariants with the zero-bias tunneling conductance values. This dissipative broadening is present in both (the existing) tunneling measurements and also (any future) braiding experiments as an inevitable consequence of a finite braiding time. The connection between the topological visibility and the conductance allows us to obtain the visibility of realistic braiding experiments in nanowires, and to conclude that the current experimentally accessible systems with nonideal zero-bias conductance peaks may indeed manifest (with rather low visibility) non-Abelian statistics for the Majorana zero modes. In general, we find that a large (small) superconducting gap (Majorana peak splitting) is essential for the manifestation of the non-Abelian braiding statistics, and in particular, a zero-bias conductance value of around half the ideal quantized Majorana value should be sufficient for the manifestation of non-Abelian statistics in experimental nanowires. Our work also establishes that as a matter of principle the topological transition associated with the emergence of Majorana zero modes in finite nanowires is always a crossover (akin to a quantum phase transition at finite temperature) requiring the presence of dissipative broadening (which must be larger than the Majorana energy
NASA Astrophysics Data System (ADS)
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2016-07-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA.
Tesoro, S; Ali, I; Morozov, A N; Sulaiman, N; Marenduzzo, D
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called '10 nm chromatin fibre', where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro. PMID:26871546
A statistical mechanical theory for a two-dimensional model of water.
Urbic, Tomaz; Dill, Ken A
2010-06-14
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
A statistical mechanical theory for a two-dimensional model of water
NASA Astrophysics Data System (ADS)
Urbic, Tomaz; Dill, Ken A.
2010-06-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA
NASA Astrophysics Data System (ADS)
Tesoro, S.; Ali, I.; Morozov, A. N.; Sulaiman, N.; Marenduzzo, D.
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called ‘10 nm chromatin fibre’, where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas [1]. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA.
Tesoro, S; Ali, I; Morozov, A N; Sulaiman, N; Marenduzzo, D
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called '10 nm chromatin fibre', where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.
A three-dimensional statistical approach to improved image quality for multislice helical CT
Thibault, Jean-Baptiste; Sauer, Ken D.; Bouman, Charles A.; Hsieh, Jiang
2007-11-15
Multislice helical computed tomography scanning offers the advantages of faster acquisition and wide organ coverage for routine clinical diagnostic purposes. However, image reconstruction is faced with the challenges of three-dimensional cone-beam geometry, data completeness issues, and low dosage. Of all available reconstruction methods, statistical iterative reconstruction (IR) techniques appear particularly promising since they provide the flexibility of accurate physical noise modeling and geometric system description. In this paper, we present the application of Bayesian iterative algorithms to real 3D multislice helical data to demonstrate significant image quality improvement over conventional techniques. We also introduce a novel prior distribution designed to provide flexibility in its parameters to fine-tune image quality. Specifically, enhanced image resolution and lower noise have been achieved, concurrently with the reduction of helical cone-beam artifacts, as demonstrated by phantom studies. Clinical results also illustrate the capabilities of the algorithm on real patient data. Although computational load remains a significant challenge for practical development, superior image quality combined with advancements in computing technology make IR techniques a legitimate candidate for future clinical applications.
A three-dimensional statistical approach to improved image quality for multislice helical CT.
Thibault, Jean-Baptiste; Sauer, Ken D; Bouman, Charles A; Hsieh, Jiang
2007-11-01
Multislice helical computed tomography scanning offers the advantages of faster acquisition and wide organ coverage for routine clinical diagnostic purposes. However, image reconstruction is faced with the challenges of three-dimensional cone-beam geometry, data completeness issues, and low dosage. Of all available reconstruction methods, statistical iterative reconstruction (IR) techniques appear particularly promising since they provide the flexibility of accurate physical noise modeling and geometric system description. In this paper, we present the application of Bayesian iterative algorithms to real 3D multislice helical data to demonstrate significant image quality improvement over conventional techniques. We also introduce a novel prior distribution designed to provide flexibility in its parameters to fine-tune image quality. Specifically, enhanced image resolution and lower noise have been achieved, concurrently with the reduction of helical cone-beam artifacts, as demonstrated by phantom studies. Clinical results also illustrate the capabilities of the algorithm on real patient data. Although computational load remains a significant challenge for practical development, superior image quality combined with advancements in computing technology make IR techniques a legitimate candidate for future clinical applications.
Air entrainment and bubble statistics in three-dimensional breaking waves
NASA Astrophysics Data System (ADS)
Deike, Luc; Melville, W. K.; Popinet, Stephane
2015-11-01
Wave breaking in the ocean is of fundamental importance in order to quantify wave dissipation and air-sea interaction, including gas and momentum exchange, and to improve parametrizationsfor weather and climate models. Here, we investigate air entrainment and bubble statistics in three-dimensional breaking waves through direct numerical simulations of the two-phase air-water flow using the Open Source solver Gerris. As in previous 2D simulations, the dissipation due to breaking is found to be in good agreement with previous experimental observations and inertial-scaling arguments. For radii larger than the Hinze scale, the bubble size distribution, is found to follow a power law of the radius, r-3and to scale linearly with the time dependent turbulent dissipation rate during the active breaking stages. The time-averaged bubble size distribution is found to follow the same power law of the radius and to scale linearly with the wave dissipation rate per unit length of breaking crest. We propose a phenomenological turbulent bubble break-up model that describes the numerical results and existing experimental results.
Large Deviations of Radial Statistics in the Two-Dimensional One-Component Plasma
NASA Astrophysics Data System (ADS)
Cunden, Fabio Deelan; Mezzadri, Francesco; Vivo, Pierpaolo
2016-09-01
The two-dimensional one-component plasma is an ubiquitous model for several vortex systems. For special values of the coupling constant β q^2 (where q is the particles charge and β the inverse temperature), the model also corresponds to the eigenvalues distribution of normal matrix models. Several features of the system are discussed in the limit of large number N of particles for generic values of the coupling constant. We show that the statistics of a class of radial observables produces a rich phase diagram, and their asymptotic behaviour in terms of large deviation functions is calculated explicitly, including next-to-leading terms up to order 1 / N. We demonstrate a split-off phenomenon associated to atypical fluctuations of the edge density profile. We also show explicitly that a failure of the fluid phase assumption of the plasma can break a genuine 1 / N-expansion of the free energy. Our findings are corroborated by numerical comparisons with exact finite- N formulae valid for β q^2=2.
A statistical mechanical theory for a two-dimensional model of water.
Urbic, Tomaz; Dill, Ken A
2010-06-14
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state. PMID:20550408
Adams, Dean C
2014-09-01
Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional
NASA Astrophysics Data System (ADS)
Germa, Aurelie; Connor, Laura; Connor, Chuck; Malservisi, Rocco
2015-04-01
One challenge of volcanic hazard assessment in distributed volcanic fields (large number of small-volume basaltic volcanoes along with one or more silicic central volcanoes) is to constrain the location of future activity. Although the extent of the source of melts at depth can be known using geophysical methods or the location of past eruptive vents, the location of preferential pathways and zones of higher magma flux are still unobserved. How does the spatial distribution of eruptive vents at the surface reveal the location of magma sources or focusing? When this distribution is investigated, the location of central polygenetic edifices as well as clusters of monogenetic volcanoes denote zones of high magma flux and recurrence rate, whereas areas of dispersed monogenetic vents represent zones of lower flux. Additionally, central polygenetic edifices, acting as magma filters, prevent dense mafic magmas from reaching the surface close to their central silicic system. Subsequently, the spatial distribution of mafic monogenetic vents may provide clues to the subsurface structure of a volcanic field, such as the location of magma sources, preferential magma pathways, and flux distribution across the field. Gathering such data is of highly importance in improving the assessment of volcanic hazards. We are developing a modeling framework that compares output of statistical models of vent distribution with outputs form numerical models of subsurface magma transport. Geologic data observed at the Earth's surface are used to develop statistical models of spatial intensity (vents per unit area), volume intensity (erupted volume per unit area) and volume-flux intensity (erupted volume per unit time and area). Outputs are in the form of probability density functions assumed to represent volcanic flow output at the surface. These are then compared to outputs from conceptual models of the subsurface processes of magma storage and transport. These models are using Darcy's law
Statistical Properties of Local AGNs Inferred from the RXTE 3-20 keV All-Sky Survey
NASA Astrophysics Data System (ADS)
Revnivtsev, M.; Sazonov, S. Yu.
We have recently ([1]) performed an all-sky survey in the 3-20 keV band from the data accumulated during satellite slews in 1996-2002 - the RXTE slew survey (XSS). For 90% of the sky at |b|>10° , a flux limit for source detection of 2.5×10-11 erg/s/sq.cm(3-20 keV) or lower was achieved, while a combined area of 7000 sq.deg was sampled to record flux levels (for such very large-area surveys) below 10-11 erg/s/sq.cm. A catalog contains 294 X-ray sources. 236 of these sources were identified with a single known astronomical object. Of particular interest are 100 identified active galactic nuclei (AGNs) and 35 unidentified sources. The hard spectra of the latter suggest that many of them will probably also prove AGNs when follow-up observations are performed. Most of the detected AGNs belong to the local population (z<0.1). In addition, the hard X-ray band of the XSS (3-20 keV) as compared to most previous X-ray surveys, performed at photon energies below 10 keV, has made possible the detection of a substantial number of X-ray absorbed AGNs (mostly Seyfert 2 galaxies). These properties make the XSS sample of AGNs a valuable one for the study of the local population of AGNs. We carried out a thorough statistical analysis of the above sample in order to investigate several key properties of the local population of AGNs, in particular their distribution in intrinsic absorption column density (NH) and X-ray luminosity function ([2]). Knowledge of these characteristics provides important constraints for AGN unification models and synthesis of the cosmic X-ray background, and is further needed to understand the details of the accretion-driven growth of supermassive black holes in the nuclei of galaxies.
NASA Astrophysics Data System (ADS)
Huang, Sheng; Du, Aimin; Cao, Xin
2015-04-01
It is well known that the magnetospheric substorm occurs every few hours, in response with the interplanetary condition variation and the increase of energy transfer from the solar wind to the magnetosphere. Since the substorm activity correlated well with the geomagnetic index, Newell and Gjerloev [2011] identified the substorm onset and its contributing station, using the SuperMag auroral electrojet indices. In this study, we investigate the distribution of these substorm onset locations and its response to the varied interplanetary condition. It is surprise that the substorm onset locations show double-peak structure with one peak around pre-midnight sector and the other at the dawn side. The substorm onset tends to occur in pre-midnight sector during non-storm time while it often takes place in late morning sector (~4 MLT) during storm time. Furthermore, substorms, appearing in magnetic storm main phase predominate in late morning. As the geomagnetic index Dst decreases, the substorm onset occurs in late morning more frequently. The substorm onset locations were also classified based on the solar wind parameters. It is shown that the peak number ratio of the substorm onset location in late morning over pre-midnight increases as IMF Bz decreases from positive to negative and the solar wind velocity Vsw enhances. The more intense interplanetary electric field E promotes the substorm onset occurring in late morning. It is widely accepted that both the directly driven (DD) and loading/unloading (LL/UL) processes play an essential role in the energy dispensation from the solar wind into the magnetosphere-ionosphere system. In general, the former one corresponds to the DP2 current system, which consists of the eastward electrojet centered near the dusk and the westward electrojet centered in the dawn, while the latter one corresponds to the DP1 current system, which is dominated by the westward electrojet in the midnight sector. Our statistical results of substorm
A statistical mechanical theory for a two-dimensional model of water
Urbic, Tomaz; Dill, Ken A.
2010-01-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the “Mercedes-Benz” (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water’s heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water’s large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state. PMID:20550408
Extracting sparse signals from high-dimensional data: A statistical mechanics approach
NASA Astrophysics Data System (ADS)
Ramezanali, Mohammad
Sparse reconstruction algorithms aim to retrieve high-dimensional sparse signals from a limited amount of measurements under suitable conditions. As the number of variables go to infinity, these algorithms exhibit sharp phase transition boundaries where the sparse retrieval breaks down. Several sparse reconstruction algorithms are formulated as optimization problems. Few of the prominent ones among these have been analyzed in the literature by statistical mechanical methods. The function to be optimized plays the role of energy. The treatment involves finite temperature replica mean-field theory followed by the zero temperature limit. Although this approach has been successful in reproducing the algorithmic phase transition boundaries, the replica trick and the non-trivial zero temperature limit obscure the underlying reasons for the failure of the algorithms. In this thesis, we employ the "cavity method" to give an alternative derivation of the phase transition boundaries, working directly in the zero-temperature limit. This approach provides insight into the origin of the different terms in the mean field self-consistency equations. The cavity method naturally generates a local susceptibility which leads to an identity that clearly indicates the existence of two phases. The identity also gives us a novel route to the known parametric expressions for the phase boundary of the Basis Pursuit algorithm and to the new ones for the Elastic Net. These transitions being continuous (second order), we explore the scaling laws and critical exponents that are uniquely determined by the nature of the distribution of the density of the nonzero components of the sparse signal. Not only is the phase boundary of the Elastic Net different from that of the Basis Pursuit, we show that the critical behavior of the two algorithms are from different universality classes.
On the statistical properties of Klein polyhedra in three-dimensional lattices
Illarionov, A A
2013-06-30
We obtain asymptotic formulae for the average values of the number of faces of a fixed type and of vertices of Klein polyhedra of three-dimensional integer lattices with a given determinant. Bibliography: 20 titles.
NASA Astrophysics Data System (ADS)
Verma, Sanjeet K.; Oliveira, Elson P.
2013-08-01
In present work, we applied two sets of new multi-dimensional geochemical diagrams (Verma et al., 2013) obtained from linear discriminant analysis (LDA) of natural logarithm-transformed ratios of major elements and immobile major and trace elements in acid magmas to decipher plate tectonic settings and corresponding probability estimates for Paleoproterozoic rocks from Amazonian craton, São Francisco craton, São Luís craton, and Borborema province of Brazil. The robustness of LDA minimizes the effects of petrogenetic processes and maximizes the separation among the different tectonic groups. The probability based boundaries further provide a better objective statistical method in comparison to the commonly used subjective method of determining the boundaries by eye judgment. The use of readjusted major element data to 100% on an anhydrous basis from SINCLAS computer program, also helps to minimize the effects of post-emplacement compositional changes and analytical errors on these tectonic discrimination diagrams. Fifteen case studies of acid suites highlighted the application of these diagrams and probability calculations. The first case study on Jamon and Musa granites, Carajás area (Central Amazonian Province, Amazonian craton) shows a collision setting (previously thought anorogenic). A collision setting was clearly inferred for Bom Jardim granite, Xingú area (Central Amazonian Province, Amazonian craton) The third case study on Older São Jorge, Younger São Jorge and Maloquinha granites Tapajós area (Ventuari-Tapajós Province, Amazonian craton) indicated a within-plate setting (previously transitional between volcanic arc and within-plate). We also recognized a within-plate setting for the next three case studies on Aripuanã and Teles Pires granites (SW Amazonian craton), and Pitinga area granites (Mapuera Suite, NW Amazonian craton), which were all previously suggested to have been emplaced in post-collision to within-plate settings. The seventh case
Computationally efficient Bayesian inference for inverse problems.
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
Mathur, Sunil; Sadana, Ajit
2015-12-01
We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set.
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
NASA Technical Reports Server (NTRS)
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
Spatial mapping and statistical reproducibility of an array of 256 one-dimensional quantum wires
Al-Taie, H. Kelly, M. J.; Smith, L. W.; Lesage, A. A. J.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Smith, C. G.; See, P.
2015-08-21
We utilize a multiplexing architecture to measure the conductance properties of an array of 256 split gates. We investigate the reproducibility of the pinch off and one-dimensional definition voltage as a function of spatial location on two different cooldowns, and after illuminating the device. The reproducibility of both these properties on the two cooldowns is high, the result of the density of the two-dimensional electron gas returning to a similar state after thermal cycling. The spatial variation of the pinch-off voltage reduces after illumination; however, the variation of the one-dimensional definition voltage increases due to an anomalous feature in the center of the array. A technique which quantifies the homogeneity of split-gate properties across the array is developed which captures the experimentally observed trends. In addition, the one-dimensional definition voltage is used to probe the density of the wafer at each split gate in the array on a micron scale using a capacitive model.
Spatial mapping and statistical reproducibility of an array of 256 one-dimensional quantum wires
NASA Astrophysics Data System (ADS)
Al-Taie, H.; Smith, L. W.; Lesage, A. A. J.; See, P.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Kelly, M. J.; Smith, C. G.
2015-08-01
We utilize a multiplexing architecture to measure the conductance properties of an array of 256 split gates. We investigate the reproducibility of the pinch off and one-dimensional definition voltage as a function of spatial location on two different cooldowns, and after illuminating the device. The reproducibility of both these properties on the two cooldowns is high, the result of the density of the two-dimensional electron gas returning to a similar state after thermal cycling. The spatial variation of the pinch-off voltage reduces after illumination; however, the variation of the one-dimensional definition voltage increases due to an anomalous feature in the center of the array. A technique which quantifies the homogeneity of split-gate properties across the array is developed which captures the experimentally observed trends. In addition, the one-dimensional definition voltage is used to probe the density of the wafer at each split gate in the array on a micron scale using a capacitive model.
Statistics of Critical Points of Gaussian Fields on Large-Dimensional Spaces
Bray, Alan J.; Dean, David S.
2007-04-13
We calculate the average number of critical points of a Gaussian field on a high-dimensional space as a function of their energy and their index. Our results give a complete picture of the organization of critical points and are of relevance to glassy and disordered systems and landscape scenarios coming from the anthropic approach to string theory.
Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics
ERIC Educational Resources Information Center
Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.
2007-01-01
We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…
On the criticality of inferred models
NASA Astrophysics Data System (ADS)
Mastromatteo, Iacopo; Marsili, Matteo
2011-10-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.
Bui, Thanh Minh; Coron, Alain; Mamou, Jonathan; Saegusa-Beecroft, Emi; Yamaguchi, Tadashi; Yanagihara, Eugene; Machi, Junji; Bridal, S Lori; Feleppa, Ernest J
2014-01-01
This work investigates the statistics of the envelope of three-dimensional (3D) high-frequency ultrasound (HFU) data acquired from dissected human lymph nodes (LNs). Nine distributions were employed, and their parameters were estimated using the method of moments. The Kolmogorov Smirnov (KS) metric was used to quantitatively compare the fit of each candidate distribution to the experimental envelope distribution. The study indicates that the generalized gamma distribution best models the statistics of the envelope data of the three media encountered: LN parenchyma, fat and phosphate-buffered saline (PBS). Furthermore, the envelope statistics of the LN parenchyma satisfy the pre-Rayleigh condition. In terms of high fitting accuracy and computationally efficient parameter estimation, the gamma distribution is the best choice to model the envelope statistics of LN parenchyma, while, the Weibull distribution is the best choice to model the envelope statistics of fat and PBS. These results will contribute to the development of more-accurate and automatic 3D segmentation of LNs for ultrasonic detection of clinically significant LN metastases.
NASA Astrophysics Data System (ADS)
Hunziker, Jürg; Laloy, Eric; Linde, Niklas
2016-04-01
Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present
Statistical Analysis of Current Sheets in Three-dimensional Magnetohydrodynamic Turbulence
NASA Astrophysics Data System (ADS)
Zhdankin, Vladimir; Uzdensky, Dmitri A.; Perez, Jean C.; Boldyrev, Stanislav
2013-07-01
We develop a framework for studying the statistical properties of current sheets in numerical simulations of magnetohydrodynamic (MHD) turbulence with a strong guide field, as modeled by reduced MHD. We describe an algorithm that identifies current sheets in a simulation snapshot and then determines their geometrical properties (including length, width, and thickness) and intensities (peak current density and total energy dissipation rate). We then apply this procedure to simulations of reduced MHD and perform a statistical analysis on the obtained population of current sheets. We evaluate the role of reconnection by separately studying the populations of current sheets which contain magnetic X-points and those which do not. We find that the statistical properties of the two populations are different in general. We compare the scaling of these properties to phenomenological predictions obtained for the inertial range of MHD turbulence. Finally, we test whether the reconnecting current sheets are consistent with the Sweet-Parker model.
Hu, Jun; Li, Zhi-Wei; Ding, Xiao-Li; Zhu, Jian-Jun
2008-01-01
The Mw=7.6 Chi-Chi earthquake in Taiwan occurred in 1999 over the Chelungpu fault and caused a great surface rupture and severe damage. Differential Synthetic Aperture Radar Interferometry (DInSAR) has been applied previously to study the co-seismic ground displacements. There have however been significant limitations in the studies. First, only one-dimensional displacements along the Line-of-Sight (LOS) direction have been measured. The large horizontal displacements along the Chelungpu fault are largely missing from the measurements as the fault is nearly perpendicular to the LOS direction. Second, due to severe signal decorrelation on the hangling wall of the fault, the displacements in that area are un-measurable by differential InSAR method. We estimate the co-seismic displacements in both the azimuth and range directions with the method of SAR amplitude image matching. GPS observations at the 10 GPS stations are used to correct for the orbital ramp in the amplitude matching and to create the two-dimensional (2D) co-seismic surface displacements field using the descending ERS-2 SAR image pair. The results show that the co-seismic displacements range from about -2.0 m to 0.7 m in the azimuth direction (with the positive direction pointing to the flight direction), with the footwall side of the fault moving mainly southwards and the hanging wall side northwards. The displacements in the LOS direction range from about -0.5 m to 1.0 m, with the largest displacement occuring in the northeastern part of the hanging wall (the positive direction points to the satellite from ground). Comparing the results from amplitude matching with those from DInSAR, we can see that while only a very small fraction of the LOS displacement has been recovered by the DInSAR mehtod, the azimuth displacements cannot be well detected with the DInSAR measurements as they are almost perpendicular to the LOS. Therefore, the amplitude matching method is obviously more advantageous than the DIn
NASA Astrophysics Data System (ADS)
Pluhacek, Frantisek; Pospisil, Jaroslav
2003-11-01
In this paper, a new automatic glaucoma diagnostics method which enables to determine the probability of glaucoma occurrence in a studied eye is described. This method is based on the computer image analysis of two-dimensional images of the blind spot of the human eye retina and on the successive statistical evaluation of the obtained data. First, the characteristic symptoms of glaucoma are shortly described. Next, a suitable numerical parameter of the retina blind spot is defined. The computer image analysis method of the automatic determination of the mentioned parameter is described and it is applied to a set of normal healthy eye images and to a set of glaucomatous eye images. The probability of glaucoma occurrence for each value of the introduced parameter is suitably defined and computed by virtue of the statistical evaluation of the obtained results.
NASA Technical Reports Server (NTRS)
Balkanski, Yves J.; Jacob, Daniel J.; Gardner, Geraldine M.; Graustein, William C.; Turekian, Karl K.
1993-01-01
A global three-dimensional model is used to investigate the transport and tropospheric residence time of Pb-210, an aerosol tracer produced in the atmosphere by radioactive decay of Rn-222 emitted from soils. The model uses meteorological input with 4 deg x 5 deg horizontal resolution and 4-hour temporal resolution from the Goddard Institute for Space Studies general circulation model (GCM). It computes aerosol scavenging by convective precipitation as part of the wet convective mass transport operator in order to capture the coupling between vertical transport and rainout. Scavenging in convective precipitation accounts for 74% of the global Pb-210 sink in the model; scavenging in large-scale precipitation accounts for 12%, and scavenging in dry deposition accounts for 14%. The model captures 63% of the variance of yearly mean Pb-210 concentrations measured at 85 sites around the world with negligible mean bias, lending support to the computation of aerosol scavenging. There are, however, a number of regional and seasonal discrepancies that reflect in part anomalies in GCM precipitation. Computed residence times with respect to deposition for Pb-210 aerosol in the tropospheric column are about 5 days at southern midlatitudes and 10-15 days in the tropics; values at northern midlatitudes vary from about 5 days in winter to 10 days in summer. The residence time of Pb-210 produced in the lowest 0.5 km of atmosphere is on average four times shorter than that of Pb-210 produced in the upper atmosphere. Both model and observations indicate a weaker decrease of Pb-210 concentrations between the continental mixed layer and the free troposphere than is observed for total aerosol concentrations; an explanation is that Rn-222 is transported to high altitudes in wet convective updrafts, while aerosols and soluble precursors of aerosols are scavenged by precipitation in the updrafts. Thus Pb-210 is not simply a tracer of aerosols produced in the continental boundary layer, but
NASA Astrophysics Data System (ADS)
Heimbach, P.; Bugnion, V.
2008-12-01
We present a new and original approach to understanding the sensitivity of the Greenland ice sheet to key model parameters and environmental conditions. At the heart of this approach is the use of an adjoint ice sheet model. MacAyeal (1992) introduced adjoints in the context of applying control theory to estimate basal sliding parameters (basal shear stress, basal friction) of an ice stream model which minimize a least-squares model vs. observation misfit. Since then, this method has become widespread to fit ice stream models to the increasing number and diversity of satellite observations, and to estimate uncertain model parameters. However, no attempt has been made to extend this method to comprehensive ice sheet models. Here, we present a first step toward moving beyond limiting the use of control theory to ice stream models. We have generated an adjoint of the three-dimensional thermo-mechanical ice sheet model SICOPOLIS of Greve (1997). The adjoint was generated using the automatic differentiation (AD) tool TAF. TAF generates exact source code representing the tangent linear and adjoint model of the parent model provided. Model sensitivities are given by the partial derivatives of a scalar-valued model diagnostic or "cost function" with respect to the controls, and can be efficiently calculated via the adjoint. An effort to generate an efficient adjoint with the newly developed open-source AD tool OpenAD is also under way. To gain insight into the adjoint solutions, we explore various cost functions, such as local and domain-integrated ice temperature, total ice volume or the velocity of ice at the margins of the ice sheet. Elements of our control space include initial cold ice temperatures, surface mass balance, as well as parameters such as appear in Glen's flow law, or in the surface degree-day or basal sliding parameterizations. Sensitivity maps provide a comprehensive view, and allow a quantification of where and to which variables the ice sheet model is
NASA Astrophysics Data System (ADS)
Jameson, A. R.; Larsen, M. L.
2016-06-01
Microphysical understanding of the variability in rain requires a statistical characterization of different drop sizes both in time and in all dimensions of space. Temporally, there have been several statistical characterizations of raindrop counts. However, temporal and spatial structures are neither equivalent nor readily translatable. While there are recent reports of the one-dimensional spatial correlation functions in rain, they can only be assumed to represent the two-dimensional (2D) correlation function under the assumption of spatial isotropy. To date, however, there are no actual observations of the (2D) spatial correlation function in rain over areas. Two reasons for this deficiency are the fiscal and the physical impossibilities of assembling a dense network of instruments over even hundreds of meters much less over kilometers. Consequently, all measurements over areas will necessarily be sparsely sampled. A dense network of data must then be estimated using interpolations from the available observations. In this work, a network of 19 optical disdrometers over a 100 m by 71 m area yield observations of drop spectra every minute. These are then interpolated to a 1 m resolution grid. Fourier techniques then yield estimates of the 2D spatial correlation functions. Preliminary examples using this technique found that steadier, light rain decorrelates spatially faster than does the convective rain, but in both cases the 2D spatial correlation functions are anisotropic, reflecting an asymmetry in the physical processes influencing the rain reaching the ground not accounted for in numerical microphysical models.
Statistics of Gravitational Microlensing Magnification. II. Three-dimensional Lens Distribution
NASA Astrophysics Data System (ADS)
Lee, Man Hoi; Babul, Arif; Kofman, Lev; Kaiser, Nick
1997-11-01
In the first paper of this series, we studied the theory of gravitational microlensing for a planar distribution of point masses. In this second paper, we extend the analysis to a three-dimensional lens distribution. First we study the lensing properties of three-dimensional lens distributions by considering in detail the critical curves, the caustics, the illumination patterns, and the magnification cross sections σ(A) of multiplane configurations with two, three, and four point masses. For N* point masses that are widely separated in Lagrangian space (i.e., in projection), we find that there are ~2N*-1 critical curves in total, but that only ~N* of these produce prominent caustic-induced features in σ(A) at moderate to high magnifications (A >~ 2). In the case of a random distribution of point masses at low optical depth, we show that the multiplane lens equation near a point mass can be reduced to the single-plane equation of a point mass perturbed by weak shear. This allows us to calculate the caustic-induced feature in the macroimage magnification distribution P(A) as a weighted sum of the semianalytic feature derived in Paper I for a planar lens distribution. The resulting semianalytic caustic-induced feature is similar to the feature in the planar case, but it does not have any simple scaling properties, and it is shifted to higher magnification. The semianalytic distribution is compared with the results of previous numerical simulations for optical depth τ ~ 0.1, and they are in better agreement than a similar comparison in the planar case. We explain this by estimating the fraction of caustics of individual lenses that merge with those of their neighbors. For τ = 0.1, the fraction is ~20%, much less than the ~55% for the planar case. In the three-dimensional case, a simple criterion for the low optical depth analysis to be valid is τ << 0.4, though the comparison with numerical simulations indicates that the semianalytic distribution is a reasonable
Statistical properties of three-dimensional two-fluid plasma model
Qaisrani, M. Hasnain; Xia, ZhenWei; Zou, Dandan
2015-09-15
The nonlinear dynamics of incompressible non-dissipative two-fluid plasma model is investigated through classical Gibbs ensemble methods. Liouville's theorem of phase space for each wave number is proved, and the absolute equilibrium spectra for Galerkin truncated two-fluid model are calculated. In two-fluid theory, the equilibrium is built on the conservation of three quadratic invariants: the total energy and the self-helicities for ions and electrons fluid, respectively. The implications of statistic equilibrium spectra with arbitrary ratios of conserved invariants are discussed.
Emergent exclusion statistics of quasiparticles in two-dimensional topological phases
NASA Astrophysics Data System (ADS)
Hu, Yuting; Stirling, Spencer D.; Wu, Yong-Shi
2014-03-01
We demonstrate how the generalized Pauli exclusion principle emerges for quasiparticle excitations in 2D topological phases. As an example, we examine the Levin-Wen model with the Fibonacci data (specified in the text), and construct the number operator for fluxons living on plaquettes. By numerically counting the many-body states with fluxon number fixed, the matrix of exclusion statistics parameters is identified and is shown to depend on the spatial topology (sphere or torus) of the system. Our work reveals the structure of the (many-body) Hilbert space and some general features of thermodynamics for quasiparticle excitations in topological matter.
Statistical characteristics of the Poincaré return times for a one-dimensional nonhyperbolic map
NASA Astrophysics Data System (ADS)
Anishchenko, V. S.; Khairulin, M.; Strelkova, G.; Kurths, J.
2011-08-01
Characteristics of the Poincaré return times are considered in a one-dimensional cubic map with a chaotic nonhyperbolic attractor. Two approaches, local one (Kac's theorem) and global one related with the AP-dimension estimation of return times, are used. The return times characteristics are studied in the presence of external noise. The characteristics of Poincaré recurrences are compared with the form of probability measure and the complete correspondence of the obtained results with the mathematical theory is shown. The influence of the attractor crisis on the return time characteristics is also analyzed. The obtained results have a methodical and educational significance and can be used for solving a number of applied tasks.
Statistics of bubble rearrangements in a slowly sheared two-dimensional foam.
Dennin, Michael
2004-10-01
Many physical systems exhibit plastic flow when subjected to slow steady shear. A unified picture of plastic flow is still lacking; however, there is an emerging theoretical understanding of such flows based on irreversible motions of the constituent "particles" of the material. Depending on the specific system, various irreversible events have been studied, such as T1 events in foam and shear transformation zones (STZ's) in amorphous solids. This paper presents an experimental study of the T1 events in a model, two-dimensional foam: bubble rafts. In particular, I report on the connection between the distribution of T1 events and the behavior of the average stress and average velocity profiles during both the initial elastic response of the bubble raft and the subsequent plastic flow at sufficiently high strains.
Statistics of bubble rearrangements in a slowly sheared two-dimensional foam
NASA Astrophysics Data System (ADS)
Dennin, Michael
2004-10-01
Many physical systems exhibit plastic flow when subjected to slow steady shear. A unified picture of plastic flow is still lacking; however, there is an emerging theoretical understanding of such flows based on irreversible motions of the constituent “particles” of the material. Depending on the specific system, various irreversible events have been studied, such as T1 events in foam and shear transformation zones (STZ’s) in amorphous solids. This paper presents an experimental study of the T1 events in a model, two-dimensional foam: bubble rafts. In particular, I report on the connection between the distribution of T1 events and the behavior of the average stress and average velocity profiles during both the initial elastic response of the bubble raft and the subsequent plastic flow at sufficiently high strains.
Three-dimensional building detection and modeling using a statistical approach.
Cord, M; Declercq, D
2001-01-01
In this paper, we address the problem of building reconstruction in high-resolution stereoscopic aerial imagery. We present a hierarchical strategy to detect and model buildings in urban sites, based on a global focusing process, followed by a local modeling. During the first step, we extract the building regions by exploiting to the full extent the depth information obtained with a new adaptive correlation stereo matching. In the modeling step, we propose a statistical approach, which is competitive to the sequential methods using segmentation and modeling. This parametric method is based on a multiplane model of the data, interpreted as a mixture model. From a Bayesian point of view the so-called augmentation of the model with indicator variables allows using stochastic algorithms to achieve both model parameter estimation and plane segmentation. We then report a Monte Carlo study of the performance of the stochastic algorithm on synthetic data, before displaying results on real data.
Order statistics for first passage times in one-dimensional diffusion processes
Yuste, S.B.; Lindenberg, K.
1996-11-01
The problem of the statistical description of the first passage time t{sub j,N} to one or two absorbing boundaries of the first j of a set of N independent diffusing particles in one dimension is revisited. An asymptotic expression for large N of the generating function of the moments of t{sub j,N}, is obtained, and explicit expressions for the first two moments are presented. The results are valid for a specific but broad class of initial distributions of particles and boundaries. The mean first passage time of the first particle
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X_{1}(t) is the position of the rightmost particle of the branching Brownian motion at time t, the empirical velocity c of this rightmost particle is defined as c=X_{1}(t)/t. Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P(c,t) of this empirical velocity c in the long-time t limit for c>2. It is already known that, for a single seed particle, P(c,t)∼exp[-(c^{2}/4-1)t] up to a prefactor that can depend on c and t. Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations. PMID:27176286
A new integrated statistical approach to the diagnostic use of two-dimensional maps.
Marengo, Emilio; Robotti, Elisa; Gianotti, Valentina; Righetti, Pier Giorgio; Cecconi, Daniela; Domenici, Enrico
2003-01-01
Two-dimensional (2-D) electrophoresis is a very useful technique for the analysis of proteins in biological tissues. The complexity of the 2-D maps obtained causes many difficulties in the comparison of different samples. A new method is proposed for comparing different 2-D maps, based on five steps: (i) the digitalisation of the image; (ii) the transformation of the digitalised map in a fuzzy entity, in order to consider the variability of the 2-D electrophoretic separation; (iii) the calculation of a similarity index for each pair of maps; (iv) the analysis by multidimensional scaling of the previously obtained similarity matrix; (v) the analysis by classification or cluster analysis techniques of the resulting map co-ordinates. The method adopted was first tested on some simulated samples in order to evaluate its sensitivity to small changes in the spots position and size. The optimal setting of the method parameters was also investigated. Finally, the method was successfully applied to a series of real samples corresponding to the electrophoretic bidimensional analysis of sera from normal and nicotine-treated rats. Multidimensional scaling allowed the separation of the two classes of samples without any misclassification. PMID:12652595
Allen, J; Velsko, S
2009-11-16
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the
Roth, A E; Jones, C D; Durian, D J
2013-04-01
We report on the statistics of bubble size, topology, and shape and on their role in the coarsening dynamics for foams consisting of bubbles compressed between two parallel plates. The design of the sample cell permits control of the liquid content, through a constant pressure condition set by the height of the foam above a liquid reservoir. We find that in the scaling regime, all bubble distributions are independent not only of time, but also of liquid content. For coarsening, the average rate decreases with liquid content due to the blocking of gas diffusion by Plateau borders inflated with liquid; we achieve a factor of 4 reduction from the dry limit. By observing the growth rate of individual bubbles, we find that von Neumann's law becomes progressively violated with increasing wetness and decreasing bubble size. We successfully model this behavior by explicitly incorporating the border-blocking effect into the von Neumann argument. Two dimensionless bubble shape parameters naturally arise, one of which is primarily responsible for the violation of von Neumann's law for foams that are not perfectly dry.
Two-dimensional wetting with binary disorder: a numerical study of the loop statistics
NASA Astrophysics Data System (ADS)
Garel, T.; Monthus, C.
2005-07-01
We numerically study the wetting (adsorption) transition of a polymer chain on a disordered substrate in 1+1 dimension. Following the Poland-Scheraga model of DNA denaturation, we use a Fixman-Freire scheme for the entropy of loops. This allows us to consider chain lengths of order N ˜105 to 106, with 104 disorder realizations. Our study is based on the statistics of loops between two contacts with the substrate, from which we define Binder-like parameters: their crossings for various sizes N allow a precise determination of the critical temperature, and their finite size properties yields a crossover exponent φ=1/(2-α) ≃0.5. We then analyse at criticality the distribution of loop length l in both regimes l ˜O(N) and 1 ≪l ≪N, as well as the finite-size properties of the contact density and energy. Our conclusion is that the critical exponents for the thermodynamics are the same as those of the pure case, except for strong logarithmic corrections to scaling. The presence of these logarithmic corrections in the thermodynamics is related to a disorder-dependent logarithmic singularity that appears in the critical loop distribution in the rescaled variable λ=l/N as λ↦1.
Abe, H; Wako, H
2006-07-01
Folding and unfolding simulations of three-dimensional lattice proteins were analyzed using a simplified statistical mechanical model in which their amino acid sequences and native conformations were incorporated explicitly. Using this statistical mechanical model, under the assumption that only interactions between amino acid residues within a local structure in a native state are considered, the partition function of the system can be calculated for a given native conformation without any adjustable parameter. The simulations were carried out for two different native conformations, for each of which two foldable amino acid sequences were considered. The native and non-native contacts between amino acid residues occurring in the simulations were examined in detail and compared with the results derived from the theoretical model. The equilibrium thermodynamic quantities (free energy, enthalpy, entropy, and the probability of each amino acid residue being in the native state) at various temperatures obtained from the simulations and the theoretical model were also examined in order to characterize the folding processes that depend on the native conformations and the amino acid sequences. Finally, the free energy landscapes were discussed based on these analyses.
Malinowski, Kathleen T.; Pantarotto, Jason R.; Senan, Suresh
2010-08-01
Purpose: To investigate the feasibility of modeling Stage III lung cancer tumor and node positions from anatomical surrogates. Methods and Materials: To localize their centroids, the primary tumor and lymph nodes from 16 Stage III lung cancer patients were contoured in 10 equal-phase planning four-dimensional (4D) computed tomography (CT) image sets. The centroids of anatomical respiratory surrogates (carina, xyphoid, nipples, mid-sternum) in each image set were also localized. The correlations between target and surrogate positions were determined, and ordinary least-squares (OLS) and partial least-squares (PLS) regression models based on a subset of respiratory phases (three to eight randomly selected) were created to predict the target positions in the remaining images. The three-phase image sets that provided the best predictive information were used to create models based on either the carina alone or all surrogates. Results: The surrogate most correlated with target motion varied widely. Depending on the number of phases used to build the models, mean OLS and PLS errors were 1.0 to 1.4 mm and 0.8 to 1.0 mm, respectively. Models trained on the 0%, 40%, and 80% respiration phases had mean ({+-} standard deviation) PLS errors of 0.8 {+-} 0.5 mm and 1.1 {+-} 1.1 mm for models based on all surrogates and carina alone, respectively. For target coordinates with motion >5 mm, the mean three-phase PLS error based on all surrogates was 1.1 mm. Conclusions: Our results establish the feasibility of inferring primary tumor and nodal motion from anatomical surrogates in 4D CT scans of Stage III lung cancer. Using inferential modeling to decrease the processing time of 4D CT scans may facilitate incorporation of patient-specific treatment margins.
Inverse Ising inference with correlated samples
NASA Astrophysics Data System (ADS)
Obermayer, Benedikt; Levine, Erel
2014-12-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem.
NASA Astrophysics Data System (ADS)
Bartolucci, Daniele; De Marchis, Francesca
2015-08-01
We are motivated by the study of the Microcanonical Variational Principle within Onsager's description of two-dimensional turbulence in the range of energies where the equivalence of statistical ensembles fails. We obtain sufficient conditions for the existence and multiplicity of solutions for the corresponding Mean Field Equation on convex and "thin" enough domains in the supercritical (with respect to the Moser-Trudinger inequality) regime. This is a brand new achievement since existence results in the supercritical region were previously known only on multiply connected domains. We then study the structure of these solutions by the analysis of their linearized problems and we also obtain a new uniqueness result for solutions of the Mean Field Equation on thin domains whose energy is uniformly bounded from above. Finally we evaluate the asymptotic expansion of those solutions with respect to the thinning parameter and, combining it with all the results obtained so far, we solve the Microcanonical Variational Principle in a small range of supercritical energies where the entropy is shown to be concave.
NASA Astrophysics Data System (ADS)
Kim, Yongku; Seo, Young-Kyo; Baek, Sung-Ok
2013-12-01
Although large quantities of air pollutants are released into the atmosphere, they are partially monitored and routinely assessed for their health implications. This paper proposes a statistical model describing the temporal behavior of hazardous air pollutants (HAPs), which can have negative effects on human health. Benzo[a]pyrene (BaP) is selected for statistical modeling. The proposed model incorporates the linkage between BaP and meteorology and is specifically formulated to identify meteorological effects and allow for seasonal trends. The model is used to estimate and forecast temporal fields of BaP conditional on observed (or forecasted) meteorological conditions, including temperature, precipitation, wind speed, and air quality. The effects of BaP on human health are examined by characterizing health indicators, namely the cancer risk and the hazard quotient. The model provides useful information for the optimal monitoring period and projection of future BaP concentrations for both industrial and residential areas in Korea.
van IJsseldijk, E. A.; Valstar, E. R.; Stoel, B. C.; Nelissen, R. G. H. H.; Baka, N.; van’t Klooster, R.
2016-01-01
t Klooster, B. L. Kaptein. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models. Bone Joint Res 2016;320–327. DOI: 10.1302/2046-3758.58.2000626. PMID:27491660
Brown, Timothy A; Barlow, David H
2005-11-01
The value of including dimensional elements in the Diagnostic and Statistical Manual of Mental Disorders (DSM) has been recognized for decades. Nevertheless, no proposals have been made for introducing dimensional classification in the diagnostic system in a valid and feasible manner. As an initial step in this endeavor, the authors suggest introducing dimensional severity ratings to the extant diagnostic categories and criteria sets. Although not without difficulties, this would begin to determine the feasibility of dimensional classification and would address some limitations of the purely categorical approach (e.g., failure to capture individual differences in disorder severity, and clinically significant features subsumed by other disorders or falling below conventional DSM thresholds). The utility of incorporating broader dimensions of temperament and personality in diagnostic systems beyond the fifth edition of the DSM is also discussed.
NASA Technical Reports Server (NTRS)
Boardman, J. W.; Pieters, C. M.; Green, R. O.; Clark, R. N.; Sunshine, J.; Combe, J.-P.; Isaacson, P.; Lundeen, S. R.; Malaret, E.; McCord, T.; Nettles, J.; Petro, N. E.; Varanasi, P.; Taylor, L.
2010-01-01
The Moon Mineralogy Mapper (M3), a NASA Discovery Mission of Opportunity, was launched October 22, 2008 from Shriharikota in India on board the Indian ISRO Chandrayaan- 1 spacecraft for a nominal two-year mission in a 100-km polar lunar orbit. M3 is a high-fidelity imaging spectrometer with 260 spectral bands in Target Mode and 85 spectral bands in a reduced-resolution Global Mode. Target Mode pixel sizes are nominally 70 meters and Global pixels (binned 2 by 2) are 140 meters, from the planned 100-km orbit. The mission was cut short, just before halfway, in August, 2009 when the spacecraft ceased operations. Despite the abbreviated mission and numerous technical and scientific challenges during the flight, M3 was able to cover more than 95% of the Moon in Global Mode. These data, presented and analyzed here as a global whole, are revolutionizing our understanding of the Moon. Already, numerous discoveries relating to volatiles and unexpected mineralogy have been published [1], [2], [3]. The rich spectral and spatial information content of the M3 data indicates that many more discoveries and an improved understanding of the mineralogy, geology, photometry, thermal regime and volatile status of our nearest neighbor are forthcoming from these data. Sadly, only minimal high-resolution Target Mode images were acquired, as these were to be the focus of the second half of the mission. This abstract gives the reader a global overview of all the M3 data that were collected and an introduction to their rich spectral character and complexity. We employ a Principal Components statistical method to assess the underlying dimensionality of the Moon as a whole, as seen by M3, and to identify numerous areas that are low-probability targets and thus of potential interest to selenologists.
Porch, Clay E.; Lauretta, Matthew V.
2016-01-01
Forecasts of the future abundance of western Atlantic bluefin tuna (Thunnus thynnus) have, for nearly two decades, been based on two competing views of future recruitment potential: (1) a “low” recruitment scenario based on hockey-stick (two-line) curve where the expected level of recruitment is set equal to the geometric mean of the recruitment estimates for the years after a supposed regime-shift in 1975, and (2) a “high” recruitment scenario based on a Beverton-Holt curve fit to the time series of spawner-recruit pairs beginning in 1970. Several investigators inferred the relative plausibility of these two scenarios based on measures of their ability to fit estimates of spawning biomass and recruitment derived from stock assessment outputs. Typically, these comparisons have assumed the assessment estimates of spawning biomass are known without error. It is shown here that ignoring error in the spawning biomass estimates can predispose model-choice approaches to favor the regime-shift hypothesis over the Beverton-Holt curve with higher recruitment potential. When the variance of the observation error approaches that which is typically estimated for assessment outputs, the same model-choice approaches tend to favor the single Beverton-Holt curve. For this and other reasons, it is argued that standard model-choice approaches are insufficient to make the case for a regime shift in the recruitment dynamics of western Atlantic bluefin tuna. A more fruitful course of action may be to move away from the current high/low recruitment dichotomy and focus instead on adopting biological reference points and management procedures that are robust to these and other sources of uncertainty. PMID:27272215
Porch, Clay E; Lauretta, Matthew V
2016-01-01
Forecasts of the future abundance of western Atlantic bluefin tuna (Thunnus thynnus) have, for nearly two decades, been based on two competing views of future recruitment potential: (1) a "low" recruitment scenario based on hockey-stick (two-line) curve where the expected level of recruitment is set equal to the geometric mean of the recruitment estimates for the years after a supposed regime-shift in 1975, and (2) a "high" recruitment scenario based on a Beverton-Holt curve fit to the time series of spawner-recruit pairs beginning in 1970. Several investigators inferred the relative plausibility of these two scenarios based on measures of their ability to fit estimates of spawning biomass and recruitment derived from stock assessment outputs. Typically, these comparisons have assumed the assessment estimates of spawning biomass are known without error. It is shown here that ignoring error in the spawning biomass estimates can predispose model-choice approaches to favor the regime-shift hypothesis over the Beverton-Holt curve with higher recruitment potential. When the variance of the observation error approaches that which is typically estimated for assessment outputs, the same model-choice approaches tend to favor the single Beverton-Holt curve. For this and other reasons, it is argued that standard model-choice approaches are insufficient to make the case for a regime shift in the recruitment dynamics of western Atlantic bluefin tuna. A more fruitful course of action may be to move away from the current high/low recruitment dichotomy and focus instead on adopting biological reference points and management procedures that are robust to these and other sources of uncertainty. PMID:27272215
Graphical inference for Infovis.
Wickham, Hadley; Cook, Dianne; Hofmann, Heike; Buja, Andreas
2010-01-01
How do we know if what we see is really there? When visualizing data, how do we avoid falling into the trap of apophenia where we see patterns in random noise? Traditionally, infovis has been concerned with discovering new relationships, and statistics with preventing spurious relationships from being reported. We pull these opposing poles closer with two new techniques for rigorous statistical inference of visual discoveries. The "Rorschach" helps the analyst calibrate their understanding of uncertainty and "line-up" provides a protocol for assessing the significance of visual discoveries, protecting against the discovery of spurious structure.
Osnes, J.D. ); Winberg, A.; Andersson, J.E.; Larsson, N.A. )
1991-09-27
Statistical and probabilistic methods for estimating the probability that a fracture is nonconductive (or equivalently, the conductive-fracture frequency) and the distribution of the transmissivities of conductive fractures from transmissivity measurements made in single-hole injection (well) tests were developed. These methods were applied to a database consisting of over 1,000 measurements made in nearly 25 km of borehole at five sites in Sweden. The depths of the measurements ranged from near the surface to over 600-m deep, and packer spacings of 20- and 25-m were used. A probabilistic model that describes the distribution of a series of transmissivity measurements was derived. When the parameters of this model were estimated using maximum likelihood estimators, the resulting estimated distributions generally fit the cumulative histograms of the transmissivity measurements very well. Further, estimates of the mean transmissivity of conductive fractures based on the maximum likelihood estimates of the model's parameters were reasonable, both in magnitude and in trend, with respect to depth. The estimates of the conductive fracture probability were generated in the range of 0.5--5.0 percent, with the higher values at shallow depths and with increasingly smaller values as depth increased. An estimation procedure based on the probabilistic model and the maximum likelihood estimators of its parameters was recommended. Some guidelines regarding the design of injection test programs were drawn from the recommended estimation procedure and the parameter estimates based on the Swedish data. 24 refs., 12 figs., 14 tabs.
NASA Astrophysics Data System (ADS)
Sazonov, S. Yu.; Revnivtsev, M. G.
2004-08-01
We compiled a sample of 95 AGNs serendipitously detected in the 3-20 keV band at Galactic latitude |b|>10o during the RXTE slew survey (XSS, Revnivtsev et al. 2004), and utilize it to study the statistical properties of the local population of AGNs, including the X-ray luminosity function and absorption distribution. We find that among low X-ray luminosity (L3-20< 1043.5 erg s-1) AGNs, the ratio of absorbed (characterized by intrinsic absorption in the range 1022 cm-2
Chang Liyun; Ho, S.-Y.; Chui, C.-S.; Lee, J.-H.; Du Yichun; Chen Tainsong
2008-06-15
We propose a new method based on statistical analysis technique to determine the minimum setup distance of a well chamber used in the calibration of {sup 192}Ir high dose rate (HDR). The chamber should be placed at least this distance away from any wall or from the floor in order to mitigate the effect of scatter. Three different chambers were included in this study, namely, Sun Nuclear Corporation, Nucletron, and Standard Imaging. The results from this study indicated that the minimum setup distance varies depending on the particular chamber and the room architecture in which the chamber was used. Our result differs from that of a previous study by Podgorsak et al. [Med. Phys. 19, 1311-1314 (1992)], in which 25 cm was suggested, and also differs from that of the International Atomic Energy Agency (IAEA)-TECDOC-1079 report, which suggested 30 cm. The new method proposed in this study may be considered as an alternative approach to determine the minimum setup distance of a well-type chamber used in the calibration of {sup 192}Ir HDR.
Thorlund, Kristian; Wetterslev, Jørn; Awad, Tahany; Thabane, Lehana; Gluud, Christian
2011-12-01
In random-effects model meta-analysis, the conventional DerSimonian-Laird (DL) estimator typically underestimates the between-trial variance. Alternative variance estimators have been proposed to address this bias. This study aims to empirically compare statistical inferences from random-effects model meta-analyses on the basis of the DL estimator and four alternative estimators, as well as distributional assumptions (normal distribution and t-distribution) about the pooled intervention effect. We evaluated the discrepancies of p-values, 95% confidence intervals (CIs) in statistically significant meta-analyses, and the degree (percentage) of statistical heterogeneity (e.g. I(2)) across 920 Cochrane primary outcome meta-analyses. In total, 414 of the 920 meta-analyses were statistically significant with the DL meta-analysis, and 506 were not. Compared with the DL estimator, the four alternative estimators yielded p-values and CIs that could be interpreted as discordant in up to 11.6% or 6% of the included meta-analyses pending whether a normal distribution or a t-distribution of the intervention effect estimates were assumed. Large discrepancies were observed for the measures of degree of heterogeneity when comparing DL with each of the four alternative estimators. Estimating the degree (percentage) of heterogeneity on the basis of less biased between-trial variance estimators seems preferable to current practice. Disclosing inferential sensitivity of p-values and CIs may also be necessary when borderline significant results have substantial impact on the conclusion. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Computing contingency statistics in parallel.
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
2010-09-01
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.
NASA Technical Reports Server (NTRS)
Varnai, Tamas; Marshak, Alexander
2000-01-01
This paper presents a simple approach to estimate the uncertainties that arise in satellite retrievals of cloud optical depth when the retrievals use one-dimensional radiative transfer theory for heterogeneous clouds that have variations in all three dimensions. For the first time, preliminary error bounds are set to estimate the uncertainty of cloud optical depth retrievals. These estimates can help us better understand the nature of uncertainties that three-dimensional effects can introduce into retrievals of this important product of the MODIS instrument. The probability distribution of resulting retrieval errors is examined through theoretical simulations of shortwave cloud reflection for a wide variety of cloud fields. The results are used to illustrate how retrieval uncertainties change with observable and known parameters, such as solar elevation or cloud brightness. Furthermore, the results indicate that a tendency observed in an earlier study, clouds appearing thicker for oblique sun, is indeed caused by three-dimensional radiative effects.
NASA Astrophysics Data System (ADS)
Baas, Jaco H.
2000-03-01
EZ-ROSE 1.0 is a computer program for the statistical analysis of populations of two-dimensional vectorial data and their presentation in equal-area rose diagrams. The program is implemented as a Microsoft® Excel workbook containing worksheets for the input of directional (circular) or lineational (semi-circular) data and their automatic processing, which includes the calculation of a frequency distribution for a selected class width, statistical analysis, and the construction of a rose diagram in CorelDraw™. The statistical analysis involves tests of uniformity for the vectorial population distribution, such as the nonparametric Kuiper and Watson tests and the parametric Rayleigh test. The statistics calculated include the vector mean, its magnitude (length) and strength (data concentration); the Batschelet circular standard deviation as an alternative measure of vectorial concentration; and a confidence sector for the vector mean. The statistics together with the frequency data are used to prepare a Corel Script™ file that contains all the necessary instructions to draw automatically an equal-area circular frequency histogram (rose diagram) in CorelDraw™. The advantages of EZ-ROSE, compared to other software for circular statistics, are: (1) the ability to use an equal-area scale in rose diagrams; (2) the wide range of tools for a comprehensive statistical analysis; (3) the ease of use, as Microsoft® Excel and CorelDraw™ are widely known to users of Microsoft® Windows; and (4) the high degree of flexibility due to the application of Microsoft® Excel and CorelDraw™, which offer a whole range of tools for possible addition of other statistical methods and changes of the rose-diagram layout.
ERIC Educational Resources Information Center
Douglas, Jeff; Kim, Hae-Rim; Roussos, Louis; Stout, William; Zhang, Jinming
An extensive nonparametric dimensionality analysis of latent structure was conducted on three forms of the Law School Admission Test (LSAT) (December 1991, June 1992, and October 1992) using the DIMTEST model in confirmatory analyses and using DIMTEST, FAC, DETECT, HCA, PROX, and a genetic algorithm in exploratory analyses. Results indicate that…
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Nonparametric inference of network structure and dynamics
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Correlation techniques and measurements of wave-height statistics
NASA Technical Reports Server (NTRS)
Guthart, H.; Taylor, W. C.; Graf, K. A.; Douglas, D. G.
1972-01-01
Statistical measurements of wave height fluctuations have been made in a wind wave tank. The power spectral density function of temporal wave height fluctuations evidenced second-harmonic components and an f to the minus 5th power law decay beyond the second harmonic. The observations of second harmonic effects agreed very well with a theoretical prediction. From the wave statistics, surface drift currents were inferred and compared to experimental measurements with satisfactory agreement. Measurements were made of the two dimensional correlation coefficient at 15 deg increments in angle with respect to the wind vector. An estimate of the two-dimensional spatial power spectral density function was also made.
Statistical inference for capture-recapture experiments
Pollock, Kenneth H.; Nichols, James D.; Brownie, Cavell; Hines, James E.
1990-01-01
This monograph presents a detailed, practical exposition on the design, analysis, and interpretation of capture-recapture studies. The Lincoln-Petersen model (Chapter 2) and the closed population models (Chapter 3) are presented only briefly because these models have been covered in detail elsewhere. The Jolly- Seber open population model, which is central to the monograph, is covered in detail in Chapter 4. In Chapter 5 we consider the "enumeration" or "calendar of captures" approach, which is widely used by mammalogists and other vertebrate ecologists. We strongly recommend that it be abandoned in favor of analyses based on the Jolly-Seber model. We consider 2 restricted versions of the Jolly-Seber model. We believe the first of these, which allows losses (mortality or emigration) but not additions (births or immigration), is likely to be useful in practice. Another series of restrictive models requires the assumptions of a constant survival rate or a constant survival rate and a constant capture rate for the duration of the study. Detailed examples are given that illustrate the usefulness of these restrictions. There often can be a substantial gain in precision over Jolly-Seber estimates. In Chapter 5 we also consider 2 generalizations of the Jolly-Seber model. The temporary trap response model allows newly marked animals to have different survival and capture rates for 1 period. The other generalization is the cohort Jolly-Seber model. Ideally all animals would be marked as young, and age effects considered by using the Jolly-Seber model on each cohort separately. In Chapter 6 we present a detailed description of an age-dependent Jolly-Seber model, which can be used when 2 or more identifiable age classes are marked. In Chapter 7 we present a detailed description of the "robust" design. Under this design each primary period contains several secondary sampling periods. We propose an estimation procedure based on closed and open population models that allows for heterogeneity and trap response of capture rates (hence the name robust design). We begin by considering just 1 age class and then extend to 2 age classes. When there are 2 age classes it is possible to distinguish immigrants and births. In Chapter 8 we give a detailed discussion of the design of capture-recapture studies. First, capture-recapture is compared to other possible sampling procedures. Next, the design of capture-recapture studies to minimize assumption violations is considered. Finally, we consider the precision of parameter estimates and present figures on proportional standard errors for a variety of initial parameter values to aid the biologist about to plan a study. A new program, JOLLY, has been written to accompany the material on the Jolly-Seber model (Chapter 4) and its extensions (Chapter 5). Another new program, JOLLYAGE, has been written for a special case of the age-dependent model (Chapter 6) where there are only 2 age classes. In Chapter 9 a brief description of the different versions of the 2 programs is given. Chapter 10 gives a brief description of some alternative approaches that were not considered in this monograph. We believe that an excellent overall view of capture- recapture models may be obtained by reading the monograph by White et al. (1982) emphasizing closed models and then reading this monograph where we concentrate on open models. The important recent monograph by Burnham et al. (1987) could then be read if there were interest in the comparison of different populations.
Statistical inference of static analysis rules
NASA Technical Reports Server (NTRS)
Engler, Dawson Richards (Inventor)
2009-01-01
Various apparatus and methods are disclosed for identifying errors in program code. Respective numbers of observances of at least one correctness rule by different code instances that relate to the at least one correctness rule are counted in the program code. Each code instance has an associated counted number of observances of the correctness rule by the code instance. Also counted are respective numbers of violations of the correctness rule by different code instances that relate to the correctness rule. Each code instance has an associated counted number of violations of the correctness rule by the code instance. A respective likelihood of the validity is determined for each code instance as a function of the counted number of observances and counted number of violations. The likelihood of validity indicates a relative likelihood that a related code instance is required to observe the correctness rule. The violations may be output in order of the likelihood of validity of a violated correctness rule.
NASA Astrophysics Data System (ADS)
Li, Yinsheng; Niu, Kai; Chen, Guang-Hong
2015-03-01
Time-resolved CT imaging methods play an increasingly important role in clinical practice, particularly, in the diagnosis and treatment of vascular diseases. In a time-resolved CT imaging protocol, it is often necessary to irradiate the patients for an extended period of time. As a result, the cumulative radiation dose in these CT applications is often higher than that of the static CT imaging protocols. Therefore, it is important to develop new means of reducing radiation dose for time-resolved CT imaging. In this paper, we present a novel statistical model based iterative reconstruction method that enables the reconstruction of low noise time-resolved CT images at low radiation exposure levels. Unlike other well known statistical reconstruction methods, this new method primarily exploits the intrinsic low dimensionality of time-resolved CT images to regularize the reconstruction. Numerical simulations were used to validate the proposed method.
NASA Astrophysics Data System (ADS)
Laloy, Eric; Rogiers, Bart; Vrugt, Jasper; Mallants, Dirk; Jacques, Diederik
2013-04-01
This study presents a novel strategy for accelerating posterior exploration of highly parameterized and CPU-demanding hydrogeologic models. The method builds on the stochastic collocation approach of Marzouk and Xiu (2009) and uses the generalized polynomial chaos (gPC) framework to emulate the output of a groundwater flow model. The resulting surrogate model is CPU-efficient and allows for sampling the posterior parameter distribution at a much reduced computational cost. This surrogate distribution is subsequently employed to precondition a state-of-the-art two-stage Markov chain Monte Carlo (MCMC) simulation (Vrugt et al., 2009; Cui et al., 2011) of the original CPU-demanding flow model. Application of the proposed method to the hydrogeological characterization of a three-dimensional multi-layered aquifer shows a 2-5 times speed up in sampling efficiency.
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1989-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z=(z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y (sup 0) = (y (sub 1) (sup 0), ..., y (sub D (sup 0)), using full or partial knowledge of the statistical distribution of the random errors in y (sup 0). The data space Y containing y(sup 0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x, Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. Confidence set inference is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface.
Burguet, Jasmine; Andrey, Philippe; Rampin, Olivier; Maurin, Yves
2009-04-10
An algorithm for the three-dimensional statistical representation of neuronal populations was designed and implemented. Using this algorithm a series of 3D models, calculated from repeated histological experiments, can be combined to provide a synthetic vision of a population of neurons taking into account biological and experimental variability. Based on the point process theory, our algorithm allows computation of neuronal density maps from which isodensity surfaces can be readily extracted and visualized as surface models revealing the statistical organization of the neuronal population under study. This algorithm was applied to the spatial distribution of locus coeruleus (LC) neurons of 30- and 90-day-old control and quaking mice. By combining 12 3D models of the LC, a region of the nucleus in which a subpopulation of neurons loses its noradrenergic phenotype between 30 and 90 days postnatally was demonstrated in control mice but not in quaking mice, leading to the hyperplasia previously reported in adult mutants. Altogether, this algorithm allows computation of 3D statistical and graphical models of neuronal populations, providing a contribution to quantitative 3D neuroanatomical modeling. PMID:19226531
Armour, Cherie
2015-01-01
There has been a substantial body of literature devoted to answering one question: Which latent model of posttraumatic stress disorder (PTSD) best represents PTSD's underlying dimensionality? This research summary will, therefore, focus on the literature pertaining to PTSD's latent structure as represented in the fourth (DSM-IV, 1994) to the fifth (DSM-5, 2013) edition of the DSM. This article will begin by providing a clear rationale as to why this is a pertinent research area, then the body of literature pertaining to the DSM-IV and DSM-IV-TR will be summarised, and this will be followed by a summary of the literature pertaining to the recently published DSM-5. To conclude, there will be a discussion with recommendations for future research directions, namely that researchers must investigate the applicability of the new DSM-5 criteria and the newly created DSM-5 symptom sets to trauma survivors. In addition, researchers must continue to endeavour to identify the "correct" constellations of symptoms within symptom sets to ensure that diagnostic algorithms are appropriate and aid in the development of targeted treatment approaches and interventions. In particular, the newly proposed DSM-5 anhedonia model, externalising behaviours model, and hybrid models must be further investigated. It is also important that researchers follow up on the idea that a more parsimonious latent structure of PTSD may exist.
NASA Astrophysics Data System (ADS)
Živić, I.; Elezović-Hadžić, S.; Milošević, S.
2014-11-01
We study the adsorption problem of linear polymers, immersed in a good solvent, when the container of the polymer-solvent system is taken to be a member of the Sierpinski gasket (SG) family of fractals, embedded in the three-dimensional Euclidean space. Members of the SG family are enumerated by an integer b (2≤b<∞), and it is assumed that one side of each SG fractal is impenetrable adsorbing boundary. We calculate the surface critical exponents γ11,γ1, and γs which, within the self-avoiding walk model (SAW) of polymer chain, are associated with the numbers of all possible SAWs with both, one, and no ends grafted to the adsorbing surface (adsorbing boundary), respectively. By applying the exact renormalization group method, for 2≤b≤4, we have obtained specific values for these exponents, for various types of polymer conformations. To extend the obtained sequences of exact values for surface critical exponents, we have applied the Monte Carlo renormalization group method for fractals with 2≤b≤40. The obtained results show that all studied exponents are monotonically increasing functions of the parameter b, for all possible polymer states. We discuss mutual relations between the studied critical exponents, and compare their values with those found for other types of lattices, in order to attain a unified picture of the attacked problem.
Schirillo, James A
2013-10-01
In studies of lightness and color constancy, the terms lightness and brightness refer to the qualia corresponding to perceived surface reflectance and perceived luminance, respectively. However, what has rarely been considered is the fact that the volume of space containing surfaces appears neither empty, void, nor black, but filled with light. Helmholtz (1866/1962) came closest to describing this phenomenon when discussing inferred illumination, but previous theoretical treatments have fallen short by restricting their considerations to the surfaces of objects. The present work is among the first to explore how we infer the light present in empty space. It concludes with several research examples supporting the theory that humans can infer the differential levels and chromaticities of illumination in three-dimensional space. PMID:23435628
Perception, illusions and Bayesian inference.
Nour, Matthew M; Nour, Joseph M
2015-01-01
Descriptive psychopathology makes a distinction between veridical perception and illusory perception. In both cases a perception is tied to a sensory stimulus, but in illusions the perception is of a false object. This article re-examines this distinction in light of new work in theoretical and computational neurobiology, which views all perception as a form of Bayesian statistical inference that combines sensory signals with prior expectations. Bayesian perceptual inference can solve the 'inverse optics' problem of veridical perception and provides a biologically plausible account of a number of illusory phenomena, suggesting that veridical and illusory perceptions are generated by precisely the same inferential mechanisms.
NASA Technical Reports Server (NTRS)
Iacovazzi, Robert A., Jr.; Prabhakara, C.; Lau, William K. M. (Technical Monitor)
2001-01-01
In this study, a model is developed to estimate mesoscale-resolution atmospheric latent heating (ALH) profiles. It utilizes rain statistics deduced from Tropical Rainfall Measuring Mission (TRMM) data, and cloud vertical velocity profiles and regional surface thermodynamic climatologies derived from other available data sources. From several rain events observed over tropical ocean and land, ALH profiles retrieved by this model in convective rain regions reveal strong warming throughout most of the troposphere, while in stratiform rain regions they usually show slight cooling below the freezing level and significant warming above. The mesoscale-average, or total, ALH profiles reveal a dominant stratiform character, because stratiform rain areas are usually much larger than convective rain areas. Sensitivity tests of the model show that total ALH at a given tropospheric level varies by less than +/- 10 % when convective and stratiform rain rates and mesoscale fractional rain areas are perturbed individually by 1 15 %. This is also found when the non-uniform convective vertical velocity profiles are replaced by one that is uniform. Larger variability of the total ALH profiles arises when climatological ocean- and land-surface temperatures (water vapor mixing ratios) are independently perturbed by +/- 1.0 K (+/- 5 %) and +/- 5.0 K (+/- 15 %), respectively. At a given tropospheric level, such perturbations can cause a +/- 25 % variation of total ALH over ocean, and a factor-of-two sensitivity over land. This sensitivity is reduced substantially if perturbations of surface thermodynamic variables do not change surface relative humidity, or are not extended throughout the entire model evaporation layer. The ALH profiles retrieved in this study agree qualitatively with tropical total diabatic heating profiles deduced in earlier studies. Also, from January and July 1999 ALH-profile climatologies generated separately with TRMM Microwave Imager and Precipitation Radar rain
Terçariol, César Augusto Sangaletti; Martinez, Alexandre Souto
2005-08-01
Consider a medium characterized by N points whose coordinates are randomly generated by a uniform distribution along the edges of a unitary d-dimensional hypercube. A walker leaves from each point of this disordered medium and moves according to the deterministic rule to go to the nearest point which has not been visited in the preceding mu steps (deterministic tourist walk). Each trajectory generated by this dynamics has an initial nonperiodic part of t steps (transient) and a final periodic part of p steps (attractor). The neighborhood rank probabilities are parametrized by the normalized incomplete beta function Id= I1/4 [1/2, (d+1) /2] . The joint distribution S(N) (mu,d) (t,p) is relevant, and the marginal distributions previously studied are particular cases. We show that, for the memory-less deterministic tourist walk in the euclidean space, this distribution is Sinfinity(1,d) (t,p) = [Gamma (1+ I(-1)(d)) (t+ I(-1)(d) ) /Gamma(t+p+ I(-1)(d)) ] delta(p,2), where t=0, 1,2, ... infinity, Gamma(z) is the gamma function and delta(i,j) is the Kronecker delta. The mean-field models are the random link models, which correspond to d-->infinity, and the random map model which, even for mu=0 , presents nontrivial cycle distribution [ S(N)(0,rm) (p) proportional to p(-1) ] : S(N)(0,rm) (t,p) =Gamma(N)/ {Gamma[N+1- (t+p) ] N( t+p)}. The fundamental quantities are the number of explored points n(e)=t+p and Id. Although the obtained distributions are simple, they do not follow straightforwardly and they have been validated by numerical experiments.
Pedoia, Valentina; Lansdown, Drew A.; Zaid, Musa; McCulloch, Charles E.; Souza, Richard; Ma, C. Benjamin; Li, Xiaojuan
2016-01-01
Objective The aim of this study is to develop a novel 3D magnetic resonance imaging (MRI)-based Statistical Shape Modeling (SSM) and apply it in knee MRIs in order to extract and compare relevant shapes of the tibia and femur in patients with and without acute ACL injuries. Methods Bilateral MR images were acquired and analyzed for 50 patients with acute ACL injuries and for 19 control subjects. A shape model was extracted for the tibia and femur using an SSM algorithm based on a set of matched landmarks that are computed in a fully automatic manner. Results Shape differences were detected between the knees in the ACL-injury group and control group, suggesting a common shape feature that may predispose these knees to injury. Some of the detected shape features that discriminate between injured and control knees are related to intercondylar width and posterior tibia slope, features that have been suggested in previous studies as ACL morphological risk factors. However, shape modeling has the great potential to quantify these characteristics with a comprehensive description of the surfaces describing complex 3D deformation that cannot be represented with simple geometric indexes. Conclusions 3D MRI-based bone shape quantification has the ability to identify specific anatomic risk factors for ACL injury. A better understanding of the role in bony shape on ligamentous injuries could help in the identification of subjects with an increased risk for an ACL tear and to develop targeted prevention strategies, including education and training. PMID:26050865
BIE: Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-12-01
The Bayesian Inference Engine (BIE) is an object-oriented library of tools written in C++ designed explicitly to enable Bayesian update and model comparison for astronomical problems. To facilitate "what if" exploration, BIE provides a command line interface (written with Bison and Flex) to run input scripts. The output of the code is a simulation of the Bayesian posterior distribution from which summary statistics e.g. by taking moments, or determine confidence intervals and so forth, can be determined. All of these quantities are fundamentally integrals and the Markov Chain approach produces variates heta distributed according to P( heta|D) so moments are trivially obtained by summing of the ensemble of variates.
Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy
2015-10-15
The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii
Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy
2015-10-15
The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii
NASA Astrophysics Data System (ADS)
de Lorenzo, Salvatore; Zollo, Aldo; Mongelli, Francesco
2001-01-01
The three-dimensional P wave attenuation structure of the Campi Flegrei caldera and the estimate of source parameters for 87 local microearthquakes is obtained by the nonlinear inversion of pulse width and rise time measurements by using the method described by Zollo and de Lorenzo (this issue). Source radii represent the better resolved parameters with values ranging from 70 m to 230 m; the dip and strike angles defining fault orientations are usually affected by larger uncertainties and are well constrained only for 11 events. The dip fault is usually confined in the range 30°-60° (with an average uncertainty of 12°) the fault strikes mainly range between -60° and 60° and seem to define preferential directions oriented radially from the symmetry axis of the ground deformation. Stress drop estimates indicate rather low values (0.01-1 MPa) which suggest low strength properties of the incoherent and brittle materials filling the caldera (primarily yellow tuffs). The three-dimensional Qp images obtained from the inversion of P pulse durations show two significant low-Qp anomalies between 0 and 1 km of depth, in the north-eastern sector and at 2-3 km of depth in the central eastern sector of the caldera. The high degree of spatial correlation of the low-Qp zone and low-Vs (as inferred by Aster and Meyer (1988)) at 0-1 km in depth and other geophysical and geochemical observations suggest that this anomaly can be related to the presence of densely fractured, porous, and fluid-filled rocks in the NE sector of the caldera. The deeper low-Qp anomaly is interpreted as being related to a dominant thermal effect. We used the surface and deep borehole temperature measurements available in the area to obtain a local calibration curve to convert Qp in temperature at Campi Flegrei. The retrieved T(Qp) map shows a high thermal deep disturbance (450°-500°C) at depths between 2 and 3 km in the eastern sector of the caldera, where the most recent eruptive activity is
NASA Astrophysics Data System (ADS)
Lorenzo, Salvatore; Zollo, Aldo; Mongelli, Francesco
2001-01-01
The three-dimensional P wave attenuation structure of the Campi Flegrei caldera and the estimate of source parameters for 87 local microearthquakes is obtained by the nonlinear inversion of pulse width and rise time measurements by using the method described by Zollo and de Lorenzo (this issue). Source radii represent the better resolved parameters with values ranging from 70 m to 230 m; the dip and strike angles defining fault orientations are usually affected by larger uncertainties and are well constrained only for 11 events. The dip fault is usually confined in the range 30°-60° (with an average uncertainty of 12°); the fault strikes mainly range between -60° and 60° and seem to define preferential directions oriented radially from the symmetry axis of the ground deformation. Stress drop estimates indicate rather low values (0.01-1 MPa) which suggest low strength properties of the incoherent and brittle materials filling the caldera (primarily yellow tuffs). The three-dimensional Qp images obtained from the inversion of P pulse durations show two significant low-Qp anomalies between 0 and 1 km of depth, in the north-eastern sector and at 2-3 km of depth in the central eastern sector of the caldera. The high degree of spatial correlation of the low-Qp zone and low-Vs (as inferred by Aster and Meyer (1988)) at 0-1 km in depth and other geophysical and geochemical observations suggest that this anomaly can be related to the presence of densely fractured, porous, and fluid-filled rocks in the NE sector of the caldera. The deeper low-Qp anomaly is interpreted as being related to a dominant thermal effect. We used the surface and deep borehole temperature measurements available in the area to obtain a local calibration curve to convert Qp in temperature at Campi Flegrei. The retrieved T(Qp) map shows a high thermal deep disturbance (450°-500°C) at depths between 2 and 3 km in the eastern sector of the caldera, where the most recent eruptive activity is
Methods for Bayesian power spectrum inference with galaxy surveys
Jasche, Jens; Wandelt, Benjamin D.
2013-12-10
We derive and implement a full Bayesian large scale structure inference method aiming at precision recovery of the cosmological power spectrum from galaxy redshift surveys. Our approach improves upon previous Bayesian methods by performing a joint inference of the three-dimensional density field, the cosmological power spectrum, luminosity dependent galaxy biases, and corresponding normalizations. We account for all joint and correlated uncertainties between all inferred quantities. Classes of galaxies with different biases are treated as separate subsamples. This method therefore also allows the combined analysis of more than one galaxy survey. In particular, it solves the problem of inferring the power spectrum from galaxy surveys with non-trivial survey geometries by exploring the joint posterior distribution with efficient implementations of multiple block Markov chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high statistical efficiency in low signal-to-noise regimes by using a deterministic reversible jump algorithm. This approach reduces the correlation length of the sampler by several orders of magnitude, turning the otherwise numerically unfeasible problem of joint parameter exploration into a numerically manageable task. We test our method on an artificial mock galaxy survey, emulating characteristic features of the Sloan Digital Sky Survey data release 7, such as its survey geometry and luminosity-dependent biases. These tests demonstrate the numerical feasibility of our large scale Bayesian inference frame work when the parameter space has millions of dimensions. This method reveals and correctly treats the anti-correlation between bias amplitudes and power spectrum, which are not taken into account in current approaches to power spectrum estimation, a 20% effect across large ranges in k space. In addition, this method results in constrained realizations of density fields obtained without assuming the power spectrum or bias parameters
Operation of the Bayes Inference Engine
Hanson, K.M.; Cunningham, G.S.
1998-07-27
The authors have developed a computer application, called the Bayes Inference Engine, to enable one to make inferences about models of a physical object from radiographs taken of it. In the BIE calculational models are represented by a data-flow diagram that can be manipulated by the analyst in a graphical-programming environment. The authors demonstrate the operation of the BIE in terms of examples of two-dimensional tomographic reconstruction including uncertainty estimation.
Kauweloa, Kevin I; Gutierrez, Alonso N; Stathakis, Sotirios; Papanikolaou, Niko; Mavroidis, Panayiotis
2016-07-01
A toolkit has been developed for calculating the 3-dimensional biological effective dose (BED) distributions in multi-phase, external beam radiotherapy treatments such as those applied in liver stereotactic body radiation therapy (SBRT) and in multi-prescription treatments. This toolkit also provides a wide range of statistical results related to dose and BED distributions. MATLAB 2010a, version 7.10 was used to create this GUI toolkit. The input data consist of the dose distribution matrices, organ contour coordinates, and treatment planning parameters from the treatment planning system (TPS). The toolkit has the capability of calculating the multi-phase BED distributions using different formulas (denoted as true and approximate). Following the calculations of the BED distributions, the dose and BED distributions can be viewed in different projections (e.g. coronal, sagittal and transverse). The different elements of this toolkit are presented and the important steps for the execution of its calculations are illustrated. The toolkit is applied on brain, head & neck and prostate cancer patients, who received primary and boost phases in order to demonstrate its capability in calculating BED distributions, as well as measuring the inaccuracy and imprecision of the approximate BED distributions. Finally, the clinical situations in which the use of the present toolkit would have a significant clinical impact are indicated.
NASA Astrophysics Data System (ADS)
Bocaniov, Serghei A.; Scavia, Donald
2016-06-01
Hypoxia or low bottom water dissolved oxygen (DO) is a world-wide problem of management concern requiring an understanding and ability to monitor and predict its spatial and temporal dynamics. However, this is often made difficult in large lakes and coastal oceans because of limited spatial and temporal coverage of field observations. We used a calibrated and validated three-dimensional ecological model of Lake Erie to extend a statistical relationship between hypoxic extent and bottom water DO concentrations to explore implications of the broader temporal and spatial development and dissipation of hypoxia. We provide the first numerical demonstration that hypoxia initiates in the nearshore, not the deep portion of the basin, and that the threshold used to define hypoxia matters in both spatial and temporal dynamics and in its sensitivity to climate. We show that existing monitoring programs likely underestimate both maximum hypoxic extent and the importance of low oxygen in the nearshore, discuss implications for ecosystem and drinking water protection, and recommend how these results could be used to efficiently and economically extend monitoring programs.
Wang, Ting; Ren, Zhao; Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L; Sweet, Robert A; Wang, Jieru; Chen, Wei
2016-02-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM".
Wang, Ting; Ren, Zhao; Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L; Sweet, Robert A; Wang, Jieru; Chen, Wei
2016-02-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM". PMID:26872036
Network Plasticity as Bayesian Inference
Legenstein, Robert; Maass, Wolfgang
2015-01-01
General results from statistical learning theory suggest to understand not only brain computations, but also brain plasticity as probabilistic inference. But a model for that has been missing. We propose that inherently stochastic features of synaptic plasticity and spine motility enable cortical networks of neurons to carry out probabilistic inference by sampling from a posterior distribution of network configurations. This model provides a viable alternative to existing models that propose convergence of parameters to maximum likelihood values. It explains how priors on weight distributions and connection probabilities can be merged optimally with learned experience, how cortical networks can generalize learned information so well to novel experiences, and how they can compensate continuously for unforeseen disturbances of the network. The resulting new theory of network plasticity explains from a functional perspective a number of experimental data on stochastic aspects of synaptic plasticity that previously appeared to be quite puzzling. PMID:26545099
Bayesian inference on proportional elections.
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.
Bayesian Inference on Proportional Elections
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software. PMID:25786259
Causal inference based on counterfactuals
Höfler, M
2005-01-01
Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept. PMID:16159397
NASA Astrophysics Data System (ADS)
Goodman, Joseph W.
2000-07-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research
Parametric inference in the large data limit using maximally informative models.
Kinney, Justin B; Atwal, Gurinder S
2014-04-01
Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference: when exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal, which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that in the large data limit, this need for a precharacterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M; R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the data processing inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions diffeomorphic modes and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.
Giraud, Philippe . E-mail: philippe.giraud@curie.net; De Rycke, Yann; Lavole, Armelle; Milleron, Bernard; Cosset, Jean-Marc; Rosenzweig, Kenneth E.
2006-01-01
Purpose: Conformal irradiation (3D-CRT) of non-small-cell lung carcinoma (NSCLC) is largely based on precise definition of the nodal clinical target volume (CTVn). A reduction of the number of nodal stations to be irradiated would facilitate tumor dose escalation. The aim of this study was to design a mathematical tool based on documented data to predict the risk of metastatic involvement for each nodal station. Methods and Materials: We reviewed the large surgical series published in the literature to identify the main pretreatment parameters that modify the risk of nodal invasion. The probability of involvement for the 17 nodal stations described by the American Thoracic Society (ATS) was computed from all these publications. Starting with the primary site of the tumor as the main characteristic, we built a probabilistic tree for each nodal station representing the risk distribution as a function of each tumor feature. Statistical analysis used the inversion of probability trees method described by Weinstein and Feinberg. Validation of the software based on 134 patients from two different populations was performed by receiver operator characteristic (ROC) curves and multivariate logistic regression. Results: Analysis of all of the various parameters of pretreatment staging relative to each level of the ATS map results in 20,000 different combinations. The first parameters included in the tree, depending on tumor site, were histologic classification, metastatic stage, nodal stage weighted as a function of the sensitivity and specificity of the diagnostic examination used (positron emission tomography scan, computed tomography scan), and tumor stage. Software is proposed to compute a predicted probability of involvement of each nodal station for any given clinical presentation. Double cross validation confirmed the methodology. A 10% cutoff point was calculated from ROC and logistic model giving the best prediction of mediastinal lymph node involvement. Conclusion
Louarn, Gaëtan; Lecoeur, Jérémie; Lebon, Eric
2008-01-01
Background and Aims In grapevine, canopy-structure-related variations in light interception and distribution affect productivity, yield and the quality of the harvested product. A simple statistical model for reconstructing three-dimensional (3D) canopy structures for various cultivar–training system (C × T) pairs has been implemented with special attention paid to balance the time required for model parameterization and accuracy of the representations from organ to stand scales. Such an approach particularly aims at overcoming the weak integration of interplant variability using the usual direct 3D measurement methods. Model This model is original in combining a turbid-medium-like envelope enclosing the volume occupied by vine shoots with the use of discrete geometric polygons representing leaves randomly located within this volume to represent plant structure. Reconstruction rules were adapted to capture the main determinants of grapevine shoot architecture and their variability. Using a simplified set of parameters, it was possible to describe (1) the 3D path of the main shoot, (2) the volume occupied by the foliage around this path and (3) the orientation of individual leaf surfaces. Model parameterization (estimation of the probability distribution for each parameter) was carried out for eight contrasting C × T pairs. Key Results and Conclusions The parameter values obtained in each situation were consistent with our knowledge of grapevine architecture. Quantitative assessments for the generated virtual scenes were carried out at the canopy and plant scales. Light interception efficiency and local variations of light transmittance within and between experimental plots were correctly simulated for all canopies studied. The approach predicted these key ecophysiological variables significantly more accurately than the classical complete digitization method with a limited number of plants. In addition, this model accurately reproduced the characteristics of a
Dynamical inference of hidden biological populations
NASA Astrophysics Data System (ADS)
Luchinsky, D. G.; Smelyanskiy, V. N.; Millonas, M.; McClintock, P. V. E.
2008-10-01
Population fluctuations in a predator-prey system are analyzed for the case where the number of prey could be determined, subject to measurement noise, but the number of predators was unknown. The problem of how to infer the unmeasured predator dynamics, as well as the model parameters, is addressed. Two solutions are suggested. In the first of these, measurement noise and the dynamical noise in the equation for predator population are neglected; the problem is reduced to a one-dimensional case, and a Bayesian dynamical inference algorithm is employed to reconstruct the model parameters. In the second solution a full-scale Markov Chain Monte Carlo simulation is used to infer both the unknown predator trajectory, and also the model parameters, using the one-dimensional solution as an initial guess.
Inference of Internal Stress in a Cell Monolayer.
Nier, Vincent; Jain, Shreyansh; Lim, Chwee Teck; Ishihara, Shuji; Ladoux, Benoit; Marcq, Philippe
2016-04-12
We combine traction force data with Bayesian inversion to obtain an absolute estimate of the internal stress field of a cell monolayer. The method, Bayesian inversion stress microscopy, is validated using numerical simulations performed in a wide range of conditions. It is robust to changes in each ingredient of the underlying statistical model. Importantly, its accuracy does not depend on the rheology of the tissue. We apply Bayesian inversion stress microscopy to experimental traction force data measured in a narrow ring of cohesive epithelial cells, and check that the inferred stress field coincides with that obtained by direct spatial integration of the traction force data in this quasi one-dimensional geometry. PMID:27074687
Petrov, S.
1996-10-01
Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z = (z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y(0)=(y sub 1(0),...,y sub D(0)) knowledge of the statistical distribution of the random errors in y(0). The data space Y containing y(0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x (e.g., energy or dissipation rate), Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. CSI is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface. Neither the heat flow nor the energy bound is strong enough to permit estimation of B(r) at single points on the CMB, but the heat flow bound permits estimation of uniform averages of B(r) over discs on the CMB, and both bounds permit weighted disc-averages with continous weighting kernels. Both bounds also permit estimation of low-degree Gauss coefficients at the CMB. The heat flow bound resolves them up to degree 8 if the crustal field at satellite altitudes must be treated as a systematic error, but can resolve to degree 11 under the most favorable statistical treatment of the crust. These two limits produce circles of confusion on the CMB with diameters of 25 deg and 19 deg respectively.
Solar structure: Models and inferences from helioseismology
Guzik, J.A.
1998-12-31
In this review the author summarizes results published during approximately the least three years concerning the state of one-dimensional solar interior modeling. She discusses the effects of refinements to the input physics, motivated by improving the agreement between calculated and observed solar oscillation frequencies, or between calculated and inferred solar structure. She has omitted two- and three-dimensional aspects of the solar structure, such as the rotation profile, detailed modeling of turbulent convection, and magnetic fields, although further progress in refining solar interior models may require including such two- and three-dimensional dynamical effects.
NASA Astrophysics Data System (ADS)
Balachandran, Prasanna V.; Xue, Dezhen; Lookman, Turab
2016-04-01
One of the key impediments to the development of BaTiO3-based materials as candidates to replace toxic-Pb-based solid solutions is their relatively low ferroelectric Curie temperature (TC). Among many potential routes that are available to modify TC, ionic substitutions at the Ba and Ti sites remain the most common approach. Here, we perform density functional theory (DFT) calculations on a series of A TiO3 and Ba B O3 perovskites, where A =Ba , Ca, Sr, Pb, Cd, Sn, and Mg and B =Ti , Zr, Hf, and Sn. Our objective is to study the relative role of A and B cations in impacting the TC of the tetragonal (P 4 m m ) and rhombohedral (R 3 m ) ferroelectric phases in BaTiO3-based solid solutions, respectively. Using symmetry-mode analysis, we obtain a quantitative description of the relative contributions of various divalent (A ) and tetravalent (B ) cations to the ferroelectric distortions. Our results show that Ca, Pb, Cd, Sn, and Mg have large mode amplitudes for ferroelectric distortion in the tetragonal phase relative to Ba, whereas Sr suppresses the distortions. On the other hand, Zr, Hf, and Sn tetravalent cations severely suppress the ferroelectric distortion in the rhombohedral phase relative to Ti. In addition to symmetry modes, our calculated unit-cell volume also agrees with the experimental trends. We subsequently utilize the symmetry modes and unit-cell volumes as features within a machine learning approach to learn TC via an inference model and uncover trends that provide insights into the design of new high-TCBaTiO3 -based ferroelectrics. The inference model predicts CdTiO3-BaTiO3 solid solutions to have a higher TC and, therefore, we experimentally synthesized these solid solutions and measured their TC. Although the calculated mode strength for CdTiO3 in the tetragonal phase is even larger than that for PbTiO3, the TC of CdTiO3-BaTiO3 solid solutions in the tetragonal phase does not show any appreciable enhancement. Thus, CdTiO3-BaTiO3 does not follow the
Bayesian inference of the initial conditions from large-scale structure surveys
NASA Astrophysics Data System (ADS)
Leclercq, Florent
2016-10-01
Analysis of three-dimensional cosmological surveys has the potential to answer outstanding questions on the initial conditions from which structure appeared, and therefore on the very high energy physics at play in the early Universe. We report on recently proposed statistical data analysis methods designed to study the primordial large-scale structure via physical inference of the initial conditions in a fully Bayesian framework, and applications to the Sloan Digital Sky Survey data release 7. We illustrate how this approach led to a detailed characterization of the dynamic cosmic web underlying the observed galaxy distribution, based on the tidal environment.
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
A Selective Overview of Variable Selection in High Dimensional Feature Space.
Fan, Jianqing; Lv, Jinchi
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Generalized Fiducial Inference for Binary Logistic Item Response Models.
Liu, Yang; Hannig, Jan
2016-06-01
Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data. PMID:26769340
Topics in inference and decision-making with partial knowledge
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.
Developing Young Students' Informal Inference Skills in Data Analysis
ERIC Educational Resources Information Center
Paparistodemou, Efi; Meletiou-Mavrotheris, Maria
2008-01-01
This paper focuses on developing students' informal inference skills, reporting on how a group of third grade students formulated and evaluated data-based inferences using the dynamic statistics data-visualization environment TinkerPlots[TM] (Konold & Miller, 2005), software specifically designed to meet the learning needs of students in the early…
NASA Technical Reports Server (NTRS)
Stone, Peter H.; Yao, Mao-Sung
1990-01-01
A number of perpetual January simulations are carried out with a two-dimensional zonally averaged model employing various parameterizations of the eddy fluxes of heat (potential temperature) and moisture. The parameterizations are evaluated by comparing these results with the eddy fluxes calculated in a parallel simulation using a three-dimensional general circulation model with zonally symmetric forcing. The three-dimensional model's performance in turn is evaluated by comparing its results using realistic (nonsymmetric) boundary conditions with observations. Branscome's parameterization of the meridional eddy flux of heat and Leovy's parameterization of the meridional eddy flux of moisture simulate the seasonal and latitudinal variations of these fluxes reasonably well, while somewhat underestimating their magnitudes. New parameterizations of the vertical eddy fluxes are developed that take into account the enhancement of the eddy mixing slope in a growing baroclinic wave due to condensation, and also the effect of eddy fluctuations in relative humidity. The new parameterizations, when tested in the two-dimensional model, simulate the seasonal, latitudinal, and vertical variations of the vertical eddy fluxes quite well, when compared with the three-dimensional model, and only underestimate the magnitude of the fluxes by 10 to 20 percent.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
ERIC Educational Resources Information Center
De Champlain, Andre F.; Gessaroli, Marc E.
A study was conducted to compare, with simulated unidimensional and two-dimensional sets, the Type I error probabilities and rejection rates obtained with two versions of the LISREL computer program, the earlier version PRELIS/LISREL 7 and the later version PRELIS2/LISREL8, a version that corrects the asymptotic covariance matrix. Unidimensional…
sick: The Spectroscopic Inference Crank
NASA Astrophysics Data System (ADS)
Casey, Andrew R.
2016-03-01
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
Bayesian Analysis of High Dimensional Classification
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Subhadeep; Liang, Faming
2009-12-01
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
Hanson, K.M.; Cunningham, G.S.
1996-04-01
The authors are developing a computer application, called the Bayes Inference Engine, to provide the means to make inferences about models of physical reality within a Bayesian framework. The construction of complex nonlinear models is achieved by a fully object-oriented design. The models are represented by a data-flow diagram that may be manipulated by the analyst through a graphical programming environment. Maximum a posteriori solutions are achieved using a general, gradient-based optimization algorithm. The application incorporates a new technique of estimating and visualizing the uncertainties in specific aspects of the model.
Yang, X.; Juhás, P.; Billinge, S. J. L.
2014-07-19
Optimal methods are explored for obtaining one-dimensional powder pattern intensities from two-dimensional planar detectors with good estimates of their standard deviations. Methods are described to estimate uncertainties when the same image is measured in multiple frames as well as from a single frame. The importance of considering the correlation of diffraction points during the integration and the resampling process of data analysis is shown. It is found that correlations between adjacent pixels in the image can lead to seriously overestimated uncertainties if such correlations are neglected in the integration process. Off-diagonal entries in the variance–covariance (VC) matrix are problematic as virtually all data processing and modeling programs cannot handle the full VC matrix. It is shown that the off-diagonal terms come mainly from the pixel-splitting algorithm used as the default integration algorithm in many popular two-dimensional integration programs, as well as from rebinning and resampling steps later in the processing. When the full VC matrix can be propagated during the data reduction, it is possible to get accurate refined parameters and their uncertainties at the cost of increasing computational complexity. However, as this is not normally possible, the best approximate methods for data processing in order to estimate uncertainties on refined parameters with the greatest accuracy from just the diagonal variance terms in the VC matrix is explored.
Inferring the Galactic gravitational potential with Gaia and friends
NASA Astrophysics Data System (ADS)
Sanderson, Robyn Ellyn; Hartke, Johanna; Helmi, Amina; Hogg, David W.
2015-01-01
In the coming decade the Gaia satellite will measure the positions and velocities of an unprecedented number of stars in our Galaxy, with unprecedented precision. Among many firsts, this revolutionary new data set will include full six-dimensional phase space information for millions of stars in the Galactic halo, including stars in many tidal streams. These streams, the products of hierarchical accretion, can be used to infer the Galactic gravitational potential thanks to the common origin of the stars in each one. We present a method for doing so by maximizing the information content (i.e. clumpiness) of the action space of the stream stars. This statistical approach eliminates the need to assign stars to particular streams. Using a toy model of the stellar halo in a known potential, and including updated error models for Gaia, we show that ground-based spectroscopic follow-up of faint halo stars is essential to complete the six-dimensional Gaia catalog and properly constrain the scale radius of the potential. By fitting a spherical NFW potential to streams in a cosmologically simulated halo, we show how oversimplification of the potential model affects fit results. This material is based upon work supported by the National Science Foundation under Award No. AST-1400989.
Inference of Isoforms from Short Sequence Reads
NASA Astrophysics Data System (ADS)
Feng, Jianxing; Li, Wei; Jiang, Tao
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS and PAS information, especially for isoforms whose expression levels are significantly high.
Reliability of the Granger causality inference
NASA Astrophysics Data System (ADS)
Zhou, Douglas; Zhang, Yaoyu; Xiao, Yanyang; Cai, David
2014-04-01
How to characterize information flows in physical, biological, and social systems remains a major theoretical challenge. Granger causality (GC) analysis has been widely used to investigate information flow through causal interactions. We address one of the central questions in GC analysis, that is, the reliability of the GC evaluation and its implications for the causal structures extracted by this analysis. Our work reveals that the manner in which a continuous dynamical process is projected or coarse-grained to a discrete process has a profound impact on the reliability of the GC inference, and different sampling may potentially yield completely opposite inferences. This inference hazard is present for both linear and nonlinear processes. We emphasize that there is a hazard of reaching incorrect conclusions about network topologies, even including statistical (such as small-world or scale-free) properties of the networks, when GC analysis is blindly applied to infer the network topology. We demonstrate this using a small-world network for which a drastic loss of small-world attributes occurs in the reconstructed network using the standard GC approach. We further show how to resolve the paradox that the GC analysis seemingly becomes less reliable when more information is incorporated using finer and finer sampling. Finally, we present strategies to overcome these inference artifacts in order to obtain a reliable GC result.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks
Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L.; Sweet, Robert A.; Wang, Jieru; Chen, Wei
2016-01-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer’s disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named “FastGGM”. PMID:26872036
Cosmic statistics of statistics
NASA Astrophysics Data System (ADS)
Szapudi, István; Colombi, Stéphane; Bernardeau, Francis
1999-12-01
The errors on statistics measured in finite galaxy catalogues are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly non-linear to weakly non-linear scales. For non-linear functions of unbiased estimators, such as the cumulants, the phenomenon of cosmic bias is identified and computed. Since it is subdued by the cosmic errors in the range of applicability of the theory, correction for it is inconsequential. In addition, the method of Colombi, Szapudi & Szalay concerning sampling effects is generalized, adapting the theory for inhomogeneous galaxy catalogues. While previous work focused on the variance only, the present article calculates the cross-correlations between moments and connected moments as well for a statistically complete description. The final analytic formulae representing the full theory are explicit but somewhat complicated. Therefore we have made available a fortran program capable of calculating the described quantities numerically (for further details e-mail SC at colombi@iap.fr). An important special case is the evaluation of the errors on the two-point correlation function, for which this should be more accurate than any method put forward previously. This tool will be immensely useful in the future for assessing the precision of measurements from existing catalogues, as well as aiding the design of new galaxy surveys. To illustrate the applicability of the results and to explore the numerical aspects of the theory qualitatively and quantitatively, the errors and cross-correlations are predicted under a wide range of assumptions for the future Sloan Digital Sky Survey. The principal results concerning the cumulants ξ, Q3 and Q4 is that
Experience and inference: how far will science carry us?
Lichtenberg, Joseph
2004-04-01
This paper begins with a view of the remarkable understanding of infant and child development that has evolved from research and observation. The limitations of this contribution from science to the multi-dimensional context-based individuality of each human in his or her intersubjective realm are then considered. For a contemporary view we must recognize the influence of the variability of experiences and the inferences drawn from them. Inferences involve symbolization and culturally derived archetypes as illustrated in a clinical example.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
NASA Astrophysics Data System (ADS)
Graham, D. B.; Cairns, Iver H.; Skjaeraasen, O.; Robinson, P. A.
2012-02-01
The temperature ratio Ti/Te of ions to electrons affects both the ion-damping rate and the ion-acoustic speed in plasmas. The effects of changing the ion-damping rate and ion-acoustic speed are investigated for electrostatic strong turbulence and electromagnetic strong turbulence in three dimensions. When ion damping is strong, density wells relax in place and act as nucleation sites for the formation of new wave packets. In this case, the density perturbations are primarily density wells supported by the ponderomotive force. For weak ion damping, corresponding to low Ti/Te, ion-acoustic waves are launched radially outwards when wave packets dissipate at burnout, thereby increasing the level of density perturbations in the system and thus raising the level of scattering of Langmuir waves off density perturbations. Density wells no longer relax in place so renucleation at recent collapse sites no longer occurs, instead wave packets form in background low density regions, such as superpositions of troughs of propagating ion-acoustic waves. This transition is found to occur at Ti/Te ≈ 0.1. The change in behavior with Ti/Te is shown to change the bulk statistical properties, scaling behavior, spectra, and field statistics of strong turbulence. For Ti/Te>rsim0.1, the electrostatic results approach the predictions of the two-component model of Robinson and Newman, and good agreement is found for Ti/Te>rsim0.15.
Zhu, H.; Braun, W.
1999-01-01
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
Dynamic colloidal assembly pathways via low dimensional models
NASA Astrophysics Data System (ADS)
Yang, Yuguang; Thyagarajan, Raghuram; Ford, David M.; Bevan, Michael A.
2016-05-01
Here we construct a low-dimensional Smoluchowski model for electric field mediated colloidal crystallization using Brownian dynamic simulations, which were previously matched to experiments. Diffusion mapping is used to infer dimensionality and confirm the use of two order parameters, one for degree of condensation and one for global crystallinity. Free energy and diffusivity landscapes are obtained as the coefficients of a low-dimensional Smoluchowski equation to capture the thermodynamics and kinetics of microstructure evolution. The resulting low-dimensional model quantitatively captures the dynamics of different assembly pathways between fluid, polycrystal, and single crystals states, in agreement with the full N-dimensional data as characterized by first passage time distributions. Numerical solution of the low-dimensional Smoluchowski equation reveals statistical properties of the dynamic evolution of states vs. applied field amplitude and system size. The low-dimensional Smoluchowski equation and associated landscapes calculated here can serve as models for predictive control of electric field mediated assembly of colloidal ensembles into two-dimensional crystalline objects.
Estimating uncertainty of inference for validation
Booker, Jane M; Langenbrunner, James R; Hemez, Francois M; Ross, Timothy J
2010-09-30
first in a series of inference uncertainty estimations. While the methods demonstrated are primarily statistical, these do not preclude the use of nonprobabilistic methods for uncertainty characterization. The methods presented permit accurate determinations for validation and eventual prediction. It is a goal that these methods establish a standard against which best practice may evolve for determining degree of validation.
Towards Context Sensitive Information Inference.
ERIC Educational Resources Information Center
Song, D.; Bruza, P. D.
2003-01-01
Discusses information inference from a psychologistic stance and proposes an information inference mechanism that makes inferences via computations of information flow through an approximation of a conceptual space. Highlights include cognitive economics of information processing; context sensitivity; and query models for information retrieval.…
Multimodel inference and adaptive management
Rehme, S.E.; Powell, L.A.; Allen, C.R.
2011-01-01
Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.
The NIFTY way of Bayesian signal inference
Selig, Marco
2014-12-05
We introduce NIFTY, 'Numerical Information Field Theory', a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D{sup 3}PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Statistical Inference-Based Cache Management for Mobile Learning
ERIC Educational Resources Information Center
Li, Qing; Zhao, Jianmin; Zhu, Xinzhong
2009-01-01
Supporting efficient data access in the mobile learning environment is becoming a hot research problem in recent years, and the problem becomes tougher when the clients are using light-weight mobile devices such as cell phones whose limited storage space prevents the clients from holding a large cache. A practical solution is to store the cache…
Statistical Inference and Spatial Patterns in Correlates of IQ
ERIC Educational Resources Information Center
Hassall, Christopher; Sherratt, Thomas N.
2011-01-01
Cross-national comparisons of IQ have become common since the release of a large dataset of international IQ scores. However, these studies have consistently failed to consider the potential lack of independence of these scores based on spatial proximity. To demonstrate the importance of this omission, we present a re-evaluation of several…
Testing Manifest Monotonicity Using Order-Constrained Statistical Inference
ERIC Educational Resources Information Center
Tijmstra, Jesper; Hessen, David J.; van der Heijden, Peter G. M.; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores,…
Drawing Statistical Inferences from Historical Census Data, 1850–1950
DAVERN, MICHAEL; RUGGLES, STEVEN; SWENSON, TAMI; ALEXANDER, J. TRENT; OAKES, J. MICHAEL
2009-01-01
Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1965, 1992). Such data can yield standard error estimates that differ dramatically from those derived from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the Integrated Public Use Microdata Series (IPUMS) project from 1850 to 1950 in order to determine (1) the impact of sample design on standard error estimates, and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation, and then we apply this approach to the 1850–1870 and 1900–1950 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples and should be applied in research analyses that have the potential for substantial clustering effects. PMID:19771946
NASA Technical Reports Server (NTRS)
Wheeler, Kevin; Timucin, Dogan; Rabbette, Maura; Curry, Charles; Allan, Mark; Lvov, Nikolay; Clanton, Sam; Pilewskie, Peter
2002-01-01
The goal of visual inference programming is to develop a software framework data analysis and to provide machine learning algorithms for inter-active data exploration and visualization. The topics include: 1) Intelligent Data Understanding (IDU) framework; 2) Challenge problems; 3) What's new here; 4) Framework features; 5) Wiring diagram; 6) Generated script; 7) Results of script; 8) Initial algorithms; 9) Independent Component Analysis for instrument diagnosis; 10) Output sensory mapping virtual joystick; 11) Output sensory mapping typing; 12) Closed-loop feedback mu-rhythm control; 13) Closed-loop training; 14) Data sources; and 15) Algorithms. This paper is in viewgraph form.
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Permutation inference for the general linear model
Winkler, Anderson M.; Ridgway, Gerard R.; Webster, Matthew A.; Smith, Stephen M.; Nichols, Thomas E.
2014-01-01
Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the “randomise” algorithm – for permutation inference with the glm. PMID:24530839
Evolutionary inferences from the analysis of exchangeability
Hendry, Andrew P.; Kaeuffer, Renaud; Crispo, Erika; Peichel, Catherine L.; Bolnick, Daniel I.
2013-01-01
Evolutionary inferences are usually based on statistical models that compare mean genotypes and phenotypes (or their frequencies) among populations. An alternative is to use the actual distribution of genotypes and phenotypes to infer the “exchangeability” of individuals among populations. We illustrate this approach by using discriminant functions on principal components to classify individuals among paired lake and stream populations of threespine stickleback in each of six independent watersheds. Classification based on neutral and non-neutral microsatellite markers was highest to the population of origin and next-highest to populations in the same watershed. These patterns are consistent with the influence of historical contingency (separate colonization of each watershed) and subsequent gene flow (within but not between watersheds). In comparison to this low genetic exchangeability, ecological (diet) and morphological (trophic and armor traits) exchangeability was relatively high – particularly among populations from similar habitats. These patterns reflect the role of natural selection in driving parallel changes adaptive changes when independent populations colonize similar habitats. Importantly, however, substantial non-parallelism was also evident. Our results show that analyses based on exchangeability can confirm inferences based on statistical analyses of means or frequencies, while also refining insights into the drivers of – and constraints on – evolutionary diversification. PMID:24299398
Inferred Lunar Boulder Distributions at Decimeter Scales
NASA Technical Reports Server (NTRS)
Baloga, S. M.; Glaze, L. S.; Spudis, P. D.
2012-01-01
Block size distributions of impact deposits on the Moon are diagnostic of the impact process and environmental effects, such as target lithology and weathering. Block size distributions are also important factors in trafficability, habitability, and possibly the identification of indigenous resources. Lunar block sizes have been investigated for many years for many purposes [e.g., 1-3]. An unresolved issue is the extent to which lunar block size distributions can be extrapolated to scales smaller than limits of resolution of direct measurement. This would seem to be a straightforward statistical application, but it is complicated by two issues. First, the cumulative size frequency distribution of observable boulders rolls over due to resolution limitations at the small end. Second, statistical regression provides the best fit only around the centroid of the data [4]. Confidence and prediction limits splay away from the best fit at the endpoints resulting in inferences in the boulder density at the CPR scale that can differ by many orders of magnitude [4]. These issues were originally investigated by Cintala and McBride [2] using Surveyor data. The objective of this study was to determine whether the measured block size distributions from Lunar Reconnaissance Orbiter Camera - Narrow Angle Camera (LROC-NAC) images (m-scale resolution) can be used to infer the block size distribution at length scales comparable to Mini-RF Circular Polarization Ratio (CPR) scales, nominally taken as 10 cm. This would set the stage for assessing correlations of inferred block size distributions with CPR returns [6].
Circular inferences in schizophrenia.
Jardri, Renaud; Denève, Sophie
2013-11-01
A considerable number of recent experimental and computational studies suggest that subtle impairments of excitatory to inhibitory balance or regulation are involved in many neurological and psychiatric conditions. The current paper aims to relate, specifically and quantitatively, excitatory to inhibitory imbalance with psychotic symptoms in schizophrenia. Considering that the brain constructs hierarchical causal models of the external world, we show that the failure to maintain the excitatory to inhibitory balance results in hallucinations as well as in the formation and subsequent consolidation of delusional beliefs. Indeed, the consequence of excitatory to inhibitory imbalance in a hierarchical neural network is equated to a pathological form of causal inference called 'circular belief propagation'. In circular belief propagation, bottom-up sensory information and top-down predictions are reverberated, i.e. prior beliefs are misinterpreted as sensory observations and vice versa. As a result, these predictions are counted multiple times. Circular inference explains the emergence of erroneous percepts, the patient's overconfidence when facing probabilistic choices, the learning of 'unshakable' causal relationships between unrelated events and a paradoxical immunity to perceptual illusions, which are all known to be associated with schizophrenia. PMID:24065721
Inferring horizontal gene transfer.
Ravenhall, Matt; Škunca, Nives; Lassalle, Florent; Dessimoz, Christophe
2015-05-01
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events. PMID:26020646
Moment inference from tomograms
Day-Lewis, F. D.; Chen, Y.; Singha, K.
2007-01-01
Time-lapse geophysical tomography can provide valuable qualitative insights into hydrologic transport phenomena associated with aquifer dynamics, tracer experiments, and engineered remediation. Increasingly, tomograms are used to infer the spatial and/or temporal moments of solute plumes; these moments provide quantitative information about transport processes (e.g., advection, dispersion, and rate-limited mass transfer) and controlling parameters (e.g., permeability, dispersivity, and rate coefficients). The reliability of moments calculated from tomograms is, however, poorly understood because classic approaches to image appraisal (e.g., the model resolution matrix) are not directly applicable to moment inference. Here, we present a semi-analytical approach to construct a moment resolution matrix based on (1) the classic model resolution matrix and (2) image reconstruction from orthogonal moments. Numerical results for radar and electrical-resistivity imaging of solute plumes demonstrate that moment values calculated from tomograms depend strongly on plume location within the tomogram, survey geometry, regularization criteria, and measurement error. Copyright 2007 by the American Geophysical Union.
Circular inferences in schizophrenia.
Jardri, Renaud; Denève, Sophie
2013-11-01
A considerable number of recent experimental and computational studies suggest that subtle impairments of excitatory to inhibitory balance or regulation are involved in many neurological and psychiatric conditions. The current paper aims to relate, specifically and quantitatively, excitatory to inhibitory imbalance with psychotic symptoms in schizophrenia. Considering that the brain constructs hierarchical causal models of the external world, we show that the failure to maintain the excitatory to inhibitory balance results in hallucinations as well as in the formation and subsequent consolidation of delusional beliefs. Indeed, the consequence of excitatory to inhibitory imbalance in a hierarchical neural network is equated to a pathological form of causal inference called 'circular belief propagation'. In circular belief propagation, bottom-up sensory information and top-down predictions are reverberated, i.e. prior beliefs are misinterpreted as sensory observations and vice versa. As a result, these predictions are counted multiple times. Circular inference explains the emergence of erroneous percepts, the patient's overconfidence when facing probabilistic choices, the learning of 'unshakable' causal relationships between unrelated events and a paradoxical immunity to perceptual illusions, which are all known to be associated with schizophrenia.
Inferring Horizontal Gene Transfer
Lassalle, Florent; Dessimoz, Christophe
2015-01-01
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages [1]. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events. PMID:26020646
Fast inference of ill-posed problems within a convex space
NASA Astrophysics Data System (ADS)
Fernandez-de-Cossio-Diaz, J.; Mulet, R.
2016-07-01
In multiple scientific and technological applications we face the problem of having low dimensional data to be justified by a linear model defined in a high dimensional parameter space. The difference in dimensionality makes the problem ill-defined: the model is consistent with the data for many values of its parameters. The objective is to find the probability distribution of parameter values consistent with the data, a problem that can be cast as the exploration of a high dimensional convex polytope. In this work we introduce a novel algorithm to solve this problem efficiently. It provides results that are statistically indistinguishable from currently used numerical techniques while its running time scales linearly with the system size. We show that the algorithm performs robustly in many abstract and practical applications. As working examples we simulate the effects of restricting reaction fluxes on the space of feasible phenotypes of a genome scale Escherichia coli metabolic network and infer the traffic flow between origin and destination nodes in a real communication network.
Another Look At The Canon of Plausible Inference
NASA Astrophysics Data System (ADS)
Solana-Ortega, Alberto; Solana, Vicente
2005-11-01
Systematic study of plausible inference is very recent. Axiomatics have been traditionally limited to the development of uninterpreted pure calculi for comparing individual inferences, ignoring the need of formalisms to solve each of these inferences and leaving the interpretation and application of such calculi to ad hoc statistical criteria which are open to inconsistencies. Here we defend a different viewpoint, regarding plausible inference in a holistic manner. Specifically we consider that all tasks involved in it, including the formalization of languages in which to pose problems, the definitions and axiomatics leading to calculation rules and those for deriving inference procedures or assignment rules, ought to be based on common grounds. For this purpose a set of elementary requirements establishing desirable properties so fundamental any theory of scientific inference should satisfy is proposed under the name of plausible inference canon. Its logical status as an extramathematical foundation is investigated, together with the different roles it plays as constructive guideline, standard for contrasting frameworks or normative stipulation. We also highlight the novelties it introduces with respect to similar proposals by other authors. In particular we concentrate on those aspects of the canon related to the critical issue of adequately incorporating basic evidential knowledge to inference.
Causal Inference in Public Health
Glass, Thomas A.; Goodman, Steven N.; Hernán, Miguel A.; Samet, Jonathan M.
2014-01-01
Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action’s consequences rather than the less precise notion of a risk factor’s causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world. PMID:23297653
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
Bayesian inference in geomagnetism
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
Prado, R A; Santos, C R; Kato, D I; Murakami, M T; Viviani, V R
2016-05-11
Beetle luciferases, the enzymes responsible for bioluminescence, are special cases of CoA-ligases which have acquired a novel oxygenase activity, offering elegant models to investigate the structural origin of novel catalytic functions in enzymes. What the original function of their ancestors was, and how the new oxygenase function emerged leading to bioluminescence remains unclear. To address these questions, we solved the crystal structure of a recently cloned Malpighian luciferase-like enzyme of unknown function from Zophobas morio mealworms, which displays weak luminescence with ATP and the xenobiotic firefly d-luciferin. The three dimensional structure of the N-terminal domain showed the expected general fold of CoA-ligases, with a unique carboxylic substrate binding pocket, permitting the binding and CoA-thioesterification activity with a broad range of carboxylic substrates, including short-, medium-chain and aromatic acids, indicating a generalist function consistent with a xenobiotic-ligase. The thioesterification activity with l-luciferin, but not with the d-enantiomer, confirms that the oxygenase activity emerged from a stereoselective impediment of the thioesterification reaction with the latter, favoring the alternative chemiluminescence oxidative reaction. The structure and site-directed mutagenesis support the involvement of the main-chain amide carbonyl of the invariant glycine G323 as the catalytic base for luciferin C4 proton abstraction during the oxygenase activity in this enzyme and in beetle luciferases (G343).
Prado, R A; Santos, C R; Kato, D I; Murakami, M T; Viviani, V R
2016-05-11
Beetle luciferases, the enzymes responsible for bioluminescence, are special cases of CoA-ligases which have acquired a novel oxygenase activity, offering elegant models to investigate the structural origin of novel catalytic functions in enzymes. What the original function of their ancestors was, and how the new oxygenase function emerged leading to bioluminescence remains unclear. To address these questions, we solved the crystal structure of a recently cloned Malpighian luciferase-like enzyme of unknown function from Zophobas morio mealworms, which displays weak luminescence with ATP and the xenobiotic firefly d-luciferin. The three dimensional structure of the N-terminal domain showed the expected general fold of CoA-ligases, with a unique carboxylic substrate binding pocket, permitting the binding and CoA-thioesterification activity with a broad range of carboxylic substrates, including short-, medium-chain and aromatic acids, indicating a generalist function consistent with a xenobiotic-ligase. The thioesterification activity with l-luciferin, but not with the d-enantiomer, confirms that the oxygenase activity emerged from a stereoselective impediment of the thioesterification reaction with the latter, favoring the alternative chemiluminescence oxidative reaction. The structure and site-directed mutagenesis support the involvement of the main-chain amide carbonyl of the invariant glycine G323 as the catalytic base for luciferin C4 proton abstraction during the oxygenase activity in this enzyme and in beetle luciferases (G343). PMID:27101527
Inferring sparse networks for noisy transient processes.
Tran, Hoang M; Bukkapatnam, Satish T S
2016-01-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the l1-min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of l1-min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues. PMID:26916813
Inferring sparse networks for noisy transient processes
NASA Astrophysics Data System (ADS)
Tran, Hoang M.; Bukkapatnam, Satish T. S.
2016-02-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.
Smith, Alwyn
1969-01-01
This paper is based on an analysis of questionnaires sent to the health ministries of Member States of WHO asking for information about the extent, nature, and scope of morbidity statistical information. It is clear that most countries collect some statistics of morbidity and many countries collect extensive data. However, few countries relate their collection to the needs of health administrators for information, and many countries collect statistics principally for publication in annual volumes which may appear anything up to 3 years after the year to which they refer. The desiderata of morbidity statistics may be summarized as reliability, representativeness, and relevance to current health problems. PMID:5306722
Learning to Observe "and" Infer
ERIC Educational Resources Information Center
Hanuscin, Deborah L.; Park Rogers, Meredith A.
2008-01-01
Researchers describe the need for students to have multiple opportunities and social interaction to learn about the differences between observation and inference and their role in developing scientific explanations (Harlen 2001; Simpson 2000). Helping children develop their skills of observation and inference in science while emphasizing the…
Feature Inference Learning and Eyetracking
ERIC Educational Resources Information Center
Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.
2009-01-01
Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…
Improving Inferences from Multiple Methods.
ERIC Educational Resources Information Center
Shotland, R. Lance; Mark, Melvin M.
1987-01-01
Multiple evaluation methods (MEMs) can cause an inferential challenge, although there are strategies to strengthen inferences. Practical and theoretical issues involved in the use by social scientists of MEMs, three potential problems in drawing inferences from MEMs, and short- and long-term strategies for alleviating these problems are outlined.…
Causal Inference in Retrospective Studies.
ERIC Educational Resources Information Center
Holland, Paul W.; Rubin, Donald B.
1988-01-01
The problem of drawing causal inferences from retrospective case-controlled studies is considered. A model for causal inference in prospective studies is applied to retrospective studies. Limitations of case-controlled studies are formulated concerning relevant parameters that can be estimated in such studies. A coffee-drinking/myocardial…
Causal Inference and Developmental Psychology
ERIC Educational Resources Information Center
Foster, E. Michael
2010-01-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2012-01-01
The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the statistical…
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot. PMID:25592596
Human brain lesion-deficit inference remapped
Mah, Yee-Haur; Husain, Masud; Rees, Geraint
2014-01-01
Our knowledge of the anatomical organization of the human brain in health and disease draws heavily on the study of patients with focal brain lesions. Historically the first method of mapping brain function, it is still potentially the most powerful, establishing the necessity of any putative neural substrate for a given function or deficit. Great inferential power, however, carries a crucial vulnerability: without stronger alternatives any consistent error cannot be easily detected. A hitherto unexamined source of such error is the structure of the high-dimensional distribution of patterns of focal damage, especially in ischaemic injury—the commonest aetiology in lesion-deficit studies—where the anatomy is naturally shaped by the architecture of the vascular tree. This distribution is so complex that analysis of lesion data sets of conventional size cannot illuminate its structure, leaving us in the dark about the presence or absence of such error. To examine this crucial question we assembled the largest known set of focal brain lesions (n = 581), derived from unselected patients with acute ischaemic injury (mean age = 62.3 years, standard deviation = 17.8, male:female ratio = 0.547), visualized with diffusion-weighted magnetic resonance imaging, and processed with validated automated lesion segmentation routines. High-dimensional analysis of this data revealed a hidden bias within the multivariate patterns of damage that will consistently distort lesion-deficit maps, displacing inferred critical regions from their true locations, in a manner opaque to replication. Quantifying the size of this mislocalization demonstrates that past lesion-deficit relationships estimated with conventional inferential methodology are likely to be significantly displaced, by a magnitude dependent on the unknown underlying lesion-deficit relationship itself. Past studies therefore cannot be retrospectively corrected, except by new knowledge that would render them redundant
Human brain lesion-deficit inference remapped.
Mah, Yee-Haur; Husain, Masud; Rees, Geraint; Nachev, Parashkev
2014-09-01
Our knowledge of the anatomical organization of the human brain in health and disease draws heavily on the study of patients with focal brain lesions. Historically the first method of mapping brain function, it is still potentially the most powerful, establishing the necessity of any putative neural substrate for a given function or deficit. Great inferential power, however, carries a crucial vulnerability: without stronger alternatives any consistent error cannot be easily detected. A hitherto unexamined source of such error is the structure of the high-dimensional distribution of patterns of focal damage, especially in ischaemic injury-the commonest aetiology in lesion-deficit studies-where the anatomy is naturally shaped by the architecture of the vascular tree. This distribution is so complex that analysis of lesion data sets of conventional size cannot illuminate its structure, leaving us in the dark about the presence or absence of such error. To examine this crucial question we assembled the largest known set of focal brain lesions (n = 581), derived from unselected patients with acute ischaemic injury (mean age = 62.3 years, standard deviation = 17.8, male:female ratio = 0.547), visualized with diffusion-weighted magnetic resonance imaging, and processed with validated automated lesion segmentation routines. High-dimensional analysis of this data revealed a hidden bias within the multivariate patterns of damage that will consistently distort lesion-deficit maps, displacing inferred critical regions from their true locations, in a manner opaque to replication. Quantifying the size of this mislocalization demonstrates that past lesion-deficit relationships estimated with conventional inferential methodology are likely to be significantly displaced, by a magnitude dependent on the unknown underlying lesion-deficit relationship itself. Past studies therefore cannot be retrospectively corrected, except by new knowledge that would render them redundant
Active Inference for Binary Symmetric Hidden Markov Models
NASA Astrophysics Data System (ADS)
Allahverdyan, Armen E.; Galstyan, Aram
2015-10-01
We consider active maximum a posteriori (MAP) inference problem for hidden Markov models (HMM), where, given an initial MAP estimate of the hidden sequence, we select to label certain states in the sequence to improve the estimation accuracy of the remaining states. We focus on the binary symmetric HMM, and employ its known mapping to 1d Ising model in random fields. From the statistical physics viewpoint, the active MAP inference problem reduces to analyzing the ground state of the 1d Ising model under modified external fields. We develop an analytical approach and obtain a closed form solution that relates the expected error reduction to model parameters under the specified active inference scheme. We then use this solution to determine most optimal active inference scheme in terms of error reduction, and examine the relation of those schemes to heuristic principles of uncertainty reduction and solution unicity.
NASA Astrophysics Data System (ADS)
Raiber, Matthias; White, Paul A.; Daughney, Christopher J.; Tschritter, Constanze; Davidson, Peter; Bainbridge, Sophie E.
2012-05-01
SummaryConcerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three-dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques (e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.
Methods for causal inference from gene perturbation experiments and validation
Meinshausen, Nicolai; Hauser, Alain; Mooij, Joris M.; Peters, Jonas; Versteeg, Philip; Bühlmann, Peter
2016-01-01
Inferring causal effects from observational and interventional data is a highly desirable but ambitious goal. Many of the computational and statistical methods are plagued by fundamental identifiability issues, instability, and unreliable performance, especially for large-scale systems with many measured variables. We present software and provide some validation of a recently developed methodology based on an invariance principle, called invariant causal prediction (ICP). The ICP method quantifies confidence probabilities for inferring causal structures and thus leads to more reliable and confirmatory statements for causal relations and predictions of external intervention effects. We validate the ICP method and some other procedures using large-scale genome-wide gene perturbation experiments in Saccharomyces cerevisiae. The results suggest that prediction and prioritization of future experimental interventions, such as gene deletions, can be improved by using our statistical inference techniques. PMID:27382150
Methods for causal inference from gene perturbation experiments and validation.
Meinshausen, Nicolai; Hauser, Alain; Mooij, Joris M; Peters, Jonas; Versteeg, Philip; Bühlmann, Peter
2016-07-01
Inferring causal effects from observational and interventional data is a highly desirable but ambitious goal. Many of the computational and statistical methods are plagued by fundamental identifiability issues, instability, and unreliable performance, especially for large-scale systems with many measured variables. We present software and provide some validation of a recently developed methodology based on an invariance principle, called invariant causal prediction (ICP). The ICP method quantifies confidence probabilities for inferring causal structures and thus leads to more reliable and confirmatory statements for causal relations and predictions of external intervention effects. We validate the ICP method and some other procedures using large-scale genome-wide gene perturbation experiments in Saccharomyces cerevisiae The results suggest that prediction and prioritization of future experimental interventions, such as gene deletions, can be improved by using our statistical inference techniques. PMID:27382150
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Inference---A Python Package for Astrostatistics
NASA Astrophysics Data System (ADS)
Loredo, T. J.; Connors, A.; Oliphant, T. E.
2004-08-01
Python is an object-oriented ``very high level language'' that is easy to learn, actively supported, and freely available for a large variety of computing platforms. It possesses sophisticated scientific computing capabilities thanks to ongoing work by a community of scientists and engineers who maintain a suite of open source scientific packages. Key contributions come from the STScI group maintaining PyRAF, a Python environment for running IRAF tasks. Python's main scientific computing packages are the Numeric and numarray packages implementing efficient array and image processing, and the SciPy package implementing a wide variety of general-use algorithms including optimization, root finding, special functions, numerical integration, and basic statistical tasks. We describe the Inference package, a collection of tools for carrying out advanced astrostatistical analyses that is about to be released as a supplement to SciPy. The Inference package has two main parts. First is a Parametric Inference Engine that offers a unified environment for analysis of parametric models with a variety of methods, including minimum χ2, maximum likelihood, and Bayesian methods. Several common analysis tasks are available with simple syntax (e.g., optimization, multidimensional exploration and integration, simulation); its parameter syntax is remensicent of that of SHERPA. Second, the package includes a growing library of diverse, specialized astrostatistical methods in a variety of domains including time series, spectrum and survey analysis, and basic image analysis. Where possible, a variety of methods are available for a given problem, enabling users to explore alternative methods in a unified environment, with the guidance of significant documentation. The Inference project is supported by NASA AISRP grant NAG5-12082.
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.
... cancer statistics across the world. U.S. Cancer Mortality Trends The best indicator of progress against cancer is ... the number of cancer survivors has increased. These trends show that progress is being made against the ...
On the scientific inference from clinical trials.
Holmberg, L; Baum, M; Adami, H O
1999-05-01
We have not been able to describe clearly how we generalize findings from a study to our own 'everyday patients'. This difficulty is not surprising, since generalization deals with how empirical observations are related to the growth of scientific knowledge, which is a major philosophical problem. An argument, sometimes used to discard evidence from a trial, is that the patient sample was too selected and therefore not 'representative' enough for the results to be meaningful for generalization. In this paper, we discuss issues of representativeness and generalizability. Other authors have shown that generalization cannot only depend on statistical inference. Then, how do randomized clinical trials contribute to the growth of knowledge? We discuss three aspects of the randomized clinical trial (Mant 1999), First, the trial is an empirical experiment set up to study the intervention on the question as specifically and as much in isolation from other -- biasing and confounding -- factors as possible (Rothman & Greenland 1998). Second, the trial is set up to challenge our prevailing hypotheses (or prejudices) and the trial is above all a help in error elimination (Popper 1992). Third, we need to learn to see new, unexpected and thought-provoking patterns in the data from a trial. Point one -- and partly point two -- refers to the paradigm of the controlled experiment in scientific method. How much a study contributes to our knowledge, with respect to points two and three, relates to its originality. In none of these respects is the representativeness of the patients, or the clinical situations, crucial for judging the study and its possible inferences. However, we also discuss that the biological domain of disease that was studied in a particular trial has to be taken into account. Thus, the inference drawn from a clinical study is not only a question of statistical generalization, but must include a jump from the world of experiences into the world of reason
Models for inference in dynamic metacommunity systems
Dorazio, Robert M.; Kery, Marc; Royle, J. Andrew; Plattner, Matthias
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species- and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity.
Models for inference in dynamic metacommunity systems
Dorazio, R.M.; Kery, M.; Royle, J. Andrew; Plattner, M.
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species-and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity. ?? 2010 by the Ecological Society of America.
Watanabe, Hiroshi
2012-01-01
Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.
NASA Astrophysics Data System (ADS)
Hermann, Claudine
Statistical Physics bridges the properties of a macroscopic system and the microscopic behavior of its constituting particles, otherwise impossible due to the giant magnitude of Avogadro's number. Numerous systems of today's key technologies - such as semiconductors or lasers - are macroscopic quantum objects; only statistical physics allows for understanding their fundamentals. Therefore, this graduate text also focuses on particular applications such as the properties of electrons in solids with applications, and radiation thermodynamics and the greenhouse effect.
Three-dimensional analysis of magnetometer array data
NASA Technical Reports Server (NTRS)
Richmond, A. D.; Baumjohann, W.
1984-01-01
A technique is developed for mapping magnetic variation fields in three dimensions using data from an array of magnetometers, based on the theory of optimal linear estimation. The technique is applied to data from the Scandinavian Magnetometer Array. Estimates of the spatial power spectra for the internal and external magnetic variations are derived, which in turn provide estimates of the spatial autocorrelation functions of the three magnetic variation components. Statistical errors involved in mapping the external and internal fields are quantified and displayed over the mapping region. Examples of field mapping and of separation into external and internal components are presented. A comparison between the three-dimensional field separation and a two-dimensional separation from a single chain of stations shows that significant differences can arise in the inferred internal component.
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
NASA Astrophysics Data System (ADS)
Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik
2013-01-01
Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
Probability-Based Inference in a Domain of Proportional Reasoning Tasks.
ERIC Educational Resources Information Center
Beland, Anne; Mislevy, Robert J.
Probability-based inference is described in the context of test theory for cognitive assessment. Its application is illustrated with an example concerning proportional reasoning. The statistical framework is that of inference networks. Ideas are demonstrated with data from a test of proportional reasoning based on work by G. Noelting (1980). The…
Active inference, communication and hermeneutics.
Friston, Karl J; Frith, Christopher D
2015-07-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa.
Active inference, communication and hermeneutics☆
Friston, Karl J.; Frith, Christopher D.
2015-01-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others – during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions – both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then – in principle – they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. PMID:25957007
Active inference, communication and hermeneutics.
Friston, Karl J; Frith, Christopher D
2015-07-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. PMID:25957007
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Linking numbers, spin, and statistics of solitons
NASA Technical Reports Server (NTRS)
Wilczek, F.; Zee, A.
1983-01-01
The spin and statistics of solitons in the (2 + 1)- and (3 + 1)-dimensional nonlinear sigma models is considered. For the (2 + 1)-dimensional case, there is the possibility of fractional spin and exotic statistics; for 3 + 1 dimensions, the usual spin-statistics relation is demonstrated. The linking-number interpretation of the Hopf invariant and the use of suspension considerably simplify the analysis.
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Children's Category-Based Inferences Affect Classification
ERIC Educational Resources Information Center
Ross, Brian H.; Gelman, Susan A.; Rosengren, Karl S.
2005-01-01
Children learn many new categories and make inferences about these categories. Much work has examined how children make inferences on the basis of category knowledge. However, inferences may also affect what is learned about a category. Four experiments examine whether category-based inferences during category learning influence category knowledge…
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-01
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Causal inference from observational data.
Listl, Stefan; Jürges, Hendrik; Watt, Richard G
2016-10-01
Randomized controlled trials have long been considered the 'gold standard' for causal inference in clinical research. In the absence of randomized experiments, identification of reliable intervention points to improve oral health is often perceived as a challenge. But other fields of science, such as social science, have always been challenged by ethical constraints to conducting randomized controlled trials. Methods have been established to make causal inference using observational data, and these methods are becoming increasingly relevant in clinical medicine, health policy and public health research. This study provides an overview of state-of-the-art methods specifically designed for causal inference in observational data, including difference-in-differences (DiD) analyses, instrumental variables (IV), regression discontinuity designs (RDD) and fixed-effects panel data analysis. The described methods may be particularly useful in dental research, not least because of the increasing availability of routinely collected administrative data and electronic health records ('big data'). PMID:27111146
Inferring Diversity: Life After Shannon
NASA Astrophysics Data System (ADS)
Giffin, Adom
The diversity of a community that cannot be fully counted must be inferred. The two preeminent inference methods are the MaxEnt method, which uses information in the form of constraints and Bayes' rule which uses information in the form of data. It has been shown that these two methods are special cases of the method of Maximum (relative) Entropy (ME). We demonstrate how this method can be used as a measure of diversity that not only reproduces the features of Shannon's index but exceeds them by allowing more types of information to be included in the inference. A specific example is solved in detail. Additionally, the entropy that is found is the same form as the thermodynamic entropy.
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
On uncertain sightings and inference about extinction.
Solow, Andrew R; Beet, Andrew R
2014-08-01
The extinction of many species can only be inferred from the record of sightings of individuals. Solow et al. (2012, Uncertain sightings and the extinction of the Ivory-billed Woodpecker. Conservation Biology 26:180-184) describe a Bayesian approach to such inference and apply it to a sighting record of the Ivory-billed Woodpecker (Campephilus principalis). A feature of this sighting record is that all uncertain sightings occurred after the most recent certain sighting. However, this appears to be an artifact. We extended this earlier work in 2 ways. First, we allowed for overlap in time between certain and uncertain sightings. Second, we considered 2 plausible statistical models of a sighting record. In one of these models, certain and uncertain sightings that are valid arise from the same process whereas in the other they arise from independent processes. We applied both models to the case of the Ivory-billed Woodpecker. The result from the first model did not favor extinction, whereas the result for the second model did. This underscores the importance, in applying tests for extinction, of understanding what could be called the natural history of the sighting record.
Measuring statistical evidence using relative belief
Evans, Michael
2016-01-01
A fundamental concern of a theory of statistical inference is how one should measure statistical evidence. Certainly the words “statistical evidence,” or perhaps just “evidence,” are much used in statistical contexts. It is fair to say, however, that the precise characterization of this concept is somewhat elusive. Our goal here is to provide a definition of how to measure statistical evidence for any particular statistical problem. Since evidence is what causes beliefs to change, it is proposed to measure evidence by the amount beliefs change from a priori to a posteriori. As such, our definition involves prior beliefs and this raises issues of subjectivity versus objectivity in statistical analyses. This is dealt with through a principle requiring the falsifiability of any ingredients to a statistical analysis. These concerns lead to checking for prior-data conflict and measuring the a priori bias in a prior. PMID:26925207
Properties of Rasch residual fit statistics.
Wu, Margaret; Adams, Richard J
2013-01-01
This paper examines the residual-based fit statistics commonly used in Rasch measurement. In particular, the paper analytically examines some of the theoretical properties of the residual-based fit statistics with a view to establishing the inferences that can be made using these fit statistics. More specifically, the relationships between the distributional properties of the fit statistics and sample size are discussed; some research that erroneously concludes that residual-based fit statistics are unstable is reviewed; and finally, it is analytically illustrated that, for dichotomous items, residual-based fit statistics provide a measure of the relative slope of empirical item characteristic curves. With a clear understanding of the theoretical properties of the fit statistics, the use and limitations of these statistics can be placed in the right light.
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
Inferring biotic interactions from proxies.
Morales-Castilla, Ignacio; Matias, Miguel G; Gravel, Dominique; Araújo, Miguel B
2015-06-01
Inferring biotic interactions from functional, phylogenetic and geographical proxies remains one great challenge in ecology. We propose a conceptual framework to infer the backbone of biotic interaction networks within regional species pools. First, interacting groups are identified to order links and remove forbidden interactions between species. Second, additional links are removed by examination of the geographical context in which species co-occur. Third, hypotheses are proposed to establish interaction probabilities between species. We illustrate the framework using published food-webs in terrestrial and marine systems. We conclude that preliminary descriptions of the web of life can be made by careful integration of data with theory.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models
Mehta, Pankaj; Schwab, David J.; Sengupta, Anirvan M.
2011-01-01
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the “inverse” statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it. PMID:22851788
An overview of recent developments in genomics and associated statistical methods.
Bickel, Peter J; Brown, James B; Huang, Haiyan; Li, Qunhua
2009-11-13
The landscape of genomics has changed drastically in the last two decades. Increasingly inexpensive sequencing has shifted the primary focus from the acquisition of biological sequences to the study of biological function. Assays have been developed to study many intricacies of biological systems, and publicly available databases have given rise to integrative analyses that combine information from many sources to draw complex conclusions. Such research was the focus of the recent workshop at the Isaac Newton Institute, 'High dimensional statistics in biology'. Many computational methods from modern genomics and related disciplines were presented and discussed. Using, as much as possible, the material from these talks, we give an overview of modern genomics: from the essential assays that make data-generation possible, to the statistical methods that yield meaningful inference. We point to current analytical challenges, where novel methods, or novel applications of extant methods, are presently needed.
Design-based and model-based inference in surveys of freshwater mollusks
Dorazio, R.M.
1999-01-01
Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.
1986-01-01
Official population data for the USSR are presented for 1985 and 1986. Part 1 (pp. 65-72) contains data on capitals of union republics and cities with over one million inhabitants, including population estimates for 1986 and vital statistics for 1985. Part 2 (p. 72) presents population estimates by sex and union republic, 1986. Part 3 (pp. 73-6) presents data on population growth, including birth, death, and natural increase rates, 1984-1985; seasonal distribution of births and deaths; birth order; age-specific birth rates in urban and rural areas and by union republic; marriages; age at marriage; and divorces. PMID:12178831
Predictive statistical models of baseline variations in 3-D femoral cortex morphology.
Zhang, Ju; Hislop-Jambrich, Jacqui; Besier, Thor F
2016-05-01
Quantifying human femoral cortex morphology is important for forensic science, surgical planning, prosthesis design and musculoskeletal modeling. Previous studies have been restricted by traditional zero or one dimensional morphometric measurements at discrete locations. We have used automatic image segmentation and statistical shape modeling methods to create predictive models of baseline 3-D femoral cortex morphology on a statistically significant population. A total of 204 femurs were automatically segmented and measured to obtain 3-D shape, whole-surface cortical thickness, and morphometric measurements. Principal components of shape and cortical thickness were correlated to anthropological data (age, sex, height and body mass) to produce predictive statistical models. We show that predictions of an individual's age, height, and sex can be improved by using 3-D shape and cortical thickness when compared with traditional morphometric measurements. We also show that femoral cortex geometry can be predicted from anthropological data combined with femoral measurements with less than 2.3 mm root mean square error, and cortical thickness with less than 0.5 mm root mean square error. The predictive models presented offer new ways to infer subject-specific 3-D femur morphology from sparse subject data for biomechanical simulations, and inversely infer subject data from femur morphology for anthropological and forensic studies. PMID:26972387
Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods.
Calderhead, Ben; Girolami, Mark
2011-12-01
Mechanistic models based on systems of nonlinear differential equations can help provide a quantitative understanding of complex physical or biological phenomena. The use of such models to describe nonlinear interactions in molecular biology has a long history; however, it is only recently that advances in computing have allowed these models to be set within a statistical framework, further increasing their usefulness and binding modelling and experimental approaches more tightly together. A probabilistic approach to modelling allows us to quantify uncertainty in both the model parameters and the model predictions, as well as in the model hypotheses themselves. In this paper, the Bayesian approach to statistical inference is adopted and we examine the significant challenges that arise when performing inference over nonlinear ordinary differential equation models describing cell signalling pathways and enzymatic circadian control; in particular, we address the difficulties arising owing to strong nonlinear correlation structures, high dimensionality and non-identifiability of parameters. We demonstrate how recently introduced differential geometric Markov chain Monte Carlo methodology alleviates many of these issues by making proposals based on local sensitivity information, which ultimately allows us to perform effective statistical analysis. Along the way, we highlight the deep link between the sensitivity analysis of such dynamic system models and the underlying Riemannian geometry of the induced posterior probability distributions. PMID:23226584
Perceptual Inference and Autistic Traits
ERIC Educational Resources Information Center
Skewes, Joshua C; Jegindø, Else-Marie; Gebauer, Line
2015-01-01
Autistic people are better at perceiving details. Major theories explain this in terms of bottom-up sensory mechanisms or in terms of top-down cognitive biases. Recently, it has become possible to link these theories within a common framework. This framework assumes that perception is implicit neural inference, combining sensory evidence with…
Science Shorts: Observation versus Inference
ERIC Educational Resources Information Center
Leager, Craig R.
2008-01-01
When you observe something, how do you know for sure what you are seeing, feeling, smelling, or hearing? Asking students to think critically about their encounters with the natural world will help to strengthen their understanding and application of the science-process skills of observation and inference. In the following lesson, students make…
Sample Size and Correlational Inference
ERIC Educational Resources Information Center
Anderson, Richard B.; Doherty, Michael E.; Friedrich, Jeff C.
2008-01-01
In 4 studies, the authors examined the hypothesis that the structure of the informational environment makes small samples more informative than large ones for drawing inferences about population correlations. The specific purpose of the studies was to test predictions arising from the signal detection simulations of R. B. Anderson, M. E. Doherty,…
Improving Explanatory Inferences from Assessments
ERIC Educational Resources Information Center
Diakow, Ronli Phyllis
2013-01-01
This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…
NASA Technical Reports Server (NTRS)
Lee, Mun Wai
2015-01-01
Crew exercise is important during long-duration space flight not only for maintaining health and fitness but also for preventing adverse health problems, such as losses in muscle strength and bone density. Monitoring crew exercise via motion capture and kinematic analysis aids understanding of the effects of microgravity on exercise and helps ensure that exercise prescriptions are effective. Intelligent Automation, Inc., has developed ESPRIT to monitor exercise activities, detect body markers, extract image features, and recover three-dimensional (3D) kinematic body poses. The system relies on prior knowledge and modeling of the human body and on advanced statistical inference techniques to achieve robust and accurate motion capture. In Phase I, the company demonstrated motion capture of several exercises, including walking, curling, and dead lifting. Phase II efforts focused on enhancing algorithms and delivering an ESPRIT prototype for testing and demonstration.
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates
Eklund, Anders; Nichols, Thomas E.; Knutsson, Hans
2016-01-01
The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging. PMID:27357684
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.
Eklund, Anders; Nichols, Thomas E; Knutsson, Hans
2016-07-12
The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging. PMID:27357684
Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.
2015-01-01
This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959
Sparse and compositionally robust inference of microbial ecological networks.
Kurtz, Zachary D; Müller, Christian L; Miraldi, Emily R; Littman, Dan R; Blaser, Martin J; Bonneau, Richard A
2015-05-01
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC
Sparse and Compositionally Robust Inference of Microbial Ecological Networks
Kurtz, Zachary D.; Müller, Christian L.; Miraldi, Emily R.; Littman, Dan R.; Blaser, Martin J.; Bonneau, Richard A.
2015-01-01
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC
Denoising and dimensionality reduction of genomic data
NASA Astrophysics Data System (ADS)
Capobianco, Enrico
2005-05-01
Genomics represents a challenging research field for many quantitative scientists, and recently a vast variety of statistical techniques and machine learning algorithms have been proposed and inspired by cross-disciplinary work with computational and systems biologists. In genomic applications, the researcher deals with noisy and complex high-dimensional feature spaces; a wealth of genes whose expression levels are experimentally measured, can often be observed for just a few time points, thus limiting the available samples. This unbalanced combination suggests that it might be hard for standard statistical inference techniques to come up with good general solutions, likewise for machine learning algorithms to avoid heavy computational work. Thus, one naturally turns to two major aspects of the problem: sparsity and intrinsic dimensionality. These two aspects are studied in this paper, where for both denoising and dimensionality reduction, a very efficient technique, i.e., Independent Component Analysis, is used. The numerical results are very promising, and lead to a very good quality of gene feature selection, due to the signal separation power enabled by the decomposition technique. We investigate how the use of replicates can improve these results, and deal with noise through a stabilization strategy which combines the estimated components and extracts the most informative biological information from them. Exploiting the inherent level of sparsity is a key issue in genetic regulatory networks, where the connectivity matrix needs to account for the real links among genes and discard many redundancies. Most experimental evidence suggests that real gene-gene connections represent indeed a subset of what is usually mapped onto either a huge gene vector or a typically dense and highly structured network. Inferring gene network connectivity from the expression levels represents a challenging inverse problem that is at present stimulating key research in biomedical
Parentage and sibship inference from multilocus genotype data under polygamy.
Wang, J; Santure, A W
2009-04-01
Likelihood methods have been developed to partition individuals in a sample into sibling clusters using genetic marker data without parental information. Most of these methods assume either both sexes are monogamous to infer full sibships only or only one sex is polygamous to infer full sibships and paternal or maternal (but not both) half sibships. We extend our previous method to the more general case of both sexes being polygamous to infer full sibships, paternal half sibships, and maternal half sibships and to the case of a two-generation sample of individuals to infer parentage jointly with sibships. The extension not only expands enormously the scope of application of the method, but also increases its statistical power. The method is implemented for both diploid and haplodiploid species and for codominant and dominant markers, with mutations and genotyping errors accommodated. The performance and robustness of the method are evaluated by analyzing both simulated and empirical data sets. Our method is shown to be much more powerful than pairwise methods in both parentage and sibship assignments because of the more efficient use of marker information. It is little affected by inbreeding in parents and is moderately robust to nonrandom mating and linkage of markers. We also show that individually much less informative markers, such as SNPs or AFLPs, can reach the same power for parentage and sibship inferences as the highly informative marker simple sequence repeats (SSRs), as long as a sufficient number of loci are employed in the analysis.
Phylogeographic ancestral inference using the coalescent model on haplotype trees.
Manolopoulou, Ioanna; Emerson, Brent C
2012-06-01
Phylogeographic ancestral inference is issue frequently arising in population ecology that aims to understand the geographical roots and structure of species. Here, we specifically address relatively small scale mtDNA datasets (typically less than 500 sequences with fewer than 1000 nucleotides), focusing on ancestral location inference. Our approach uses a coalescent modelling framework projected onto haplotype trees in order to reduce computational complexity, at the same time adhering to complex evolutionary processes. Statistical innovations of the last few years have allowed for computationally feasible yet accurate inferences in phylogenetic frameworks. We implement our methods on a set of synthetic datasets and show how, despite high uncertainty in terms of identifying the root haplotype, estimation of the ancestral location naturally encompasses lower uncertainty, allowing us to pinpoint the Maximum A Posteriori estimates for ancestral locations. We exemplify our methods on a set of synthetic datasets and then combine our inference methods with the phylogeographic clustering approach presented in Manolopoulou et al. (2011) on a real dataset from weevils in the Iberian peninsula in order to infer ancestral locations as well as population substructure.
Cortical hierarchies perform Bayesian causal inference in multisensory perception.
Rohe, Tim; Noppeney, Uta
2015-02-01
To form a veridical percept of the environment, the brain needs to integrate sensory signals from a common source but segregate those from independent sources. Thus, perception inherently relies on solving the "causal inference problem." Behaviorally, humans solve this problem optimally as predicted by Bayesian Causal Inference; yet, the underlying neural mechanisms are unexplored. Combining psychophysics, Bayesian modeling, functional magnetic resonance imaging (fMRI), and multivariate decoding in an audiovisual spatial localization task, we demonstrate that Bayesian Causal Inference is performed by a hierarchy of multisensory processes in the human brain. At the bottom of the hierarchy, in auditory and visual areas, location is represented on the basis that the two signals are generated by independent sources (= segregation). At the next stage, in posterior intraparietal sulcus, location is estimated under the assumption that the two signals are from a common source (= forced fusion). Only at the top of the hierarchy, in anterior intraparietal sulcus, the uncertainty about the causal structure of the world is taken into account and sensory signals are combined as predicted by Bayesian Causal Inference. Characterizing the computational operations of signal interactions reveals the hierarchical nature of multisensory perception in human neocortex. It unravels how the brain accomplishes Bayesian Causal Inference, a statistical computation fundamental for perception and cognition. Our results demonstrate how the brain combines information in the face of uncertainty about the underlying causal structure of the world.
Inference for reaction networks using the linear noise approximation.
Fearnhead, Paul; Giagos, Vasilieos; Sherlock, Chris
2014-06-01
We consider inference for the reaction rates in discretely observed networks such as those found in models for systems biology, population ecology, and epidemics. Most such networks are neither slow enough nor small enough for inference via the true state-dependent Markov jump process to be feasible. Typically, inference is conducted by approximating the dynamics through an ordinary differential equation (ODE) or a stochastic differential equation (SDE). The former ignores the stochasticity in the true model and can lead to inaccurate inferences. The latter is more accurate but is harder to implement as the transition density of the SDE model is generally unknown. The linear noise approximation (LNA) arises from a first-order Taylor expansion of the approximating SDE about a deterministic solution and can be viewed as a compromise between the ODE and SDE models. It is a stochastic model, but discrete time transition probabilities for the LNA are available through the solution of a series of ordinary differential equations. We describe how a restarting LNA can be efficiently used to perform inference for a general class of reaction networks; evaluate the accuracy of such an approach; and show how and when this approach is either statistically or computationally more efficient than ODE or SDE methods. We apply the LNA to analyze Google Flu Trends data from the North and South Islands of New Zealand, and are able to obtain more accurate short-term forecasts of new flu cases than another recently proposed method, although at a greater computational cost.
Stern, Adi; Doron-Faigenboim, Adi; Erez, Elana; Martz, Eric; Bacharach, Eran; Pupko, Tal
2007-07-01
Biologically significant sites in a protein may be identified by contrasting the rates of synonymous (K(s)) and non-synonymous (K(a)) substitutions. This enables the inference of site-specific positive Darwinian selection and purifying selection. We present here Selecton version 2.2 (http://selecton.bioinfo.tau.ac.il), a web server which automatically calculates the ratio between K(a) and K(s) (omega) at each site of the protein. This ratio is graphically displayed on each site using a color-coding scheme, indicating either positive selection, purifying selection or lack of selection. Selecton implements an assembly of different evolutionary models, which allow for statistical testing of the hypothesis that a protein has undergone positive selection. Specifically, the recently developed mechanistic-empirical model is introduced, which takes into account the physicochemical properties of amino acids. Advanced options were introduced to allow maximal fine tuning of the server to the user's specific needs, including calculation of statistical support of the omega values, an advanced graphic display of the protein's 3-dimensional structure, use of different genetic codes and inputting of a pre-built phylogenetic tree. Selecton version 2.2 is an effective, user-friendly and freely available web server which implements up-to-date methods for computing site-specific selection forces, and the visualization of these forces on the protein's sequence and structure.
Inferring ancestral sequences in taxon-rich phylogenies.
Gascuel, Olivier; Steel, Mike
2010-10-01
Statistical consistency in phylogenetics has traditionally referred to the accuracy of estimating phylogenetic parameters for a fixed number of species as we increase the number of characters. However, it is also useful to consider a dual type of statistical consistency where we increase the number of species, rather than characters. This raises some basic questions: what can we learn about the evolutionary process as we increase the number of species? In particular, does having more species allow us to infer the ancestral state of characters accurately? This question is particularly important when sequence evolution varies in a complex way from character to character, as methods applicable for i.i.d. models may no longer be valid. In this paper, we assemble a collection of results to analyse various approaches for inferring ancestral information with increasing accuracy as the number of taxa increases.
Advanced Algorithms and Statistics for MOS Surveys
NASA Astrophysics Data System (ADS)
Bolton, A. S.
2016-10-01
This paper presents an individual view on the current state of computational data processing and statistics for inference and discovery in multi-object spectroscopic surveys, supplemented by a historical perspective and a few present-day applications. It is more op-ed than review, and hopefully more readable as a result.
Relationship inference based on DNA mixtures.
Kaur, Navreet; Bouzga, Mariam M; Dørum, Guro; Egeland, Thore
2016-03-01
Today, there exists a number of tools for solving kinship cases. But what happens when information comes from a mixture? DNA mixtures are in general rarely seen in kinship cases, but in a case presented to the Norwegian Institute of Public Health, sample DNA was obtained after a rape case that resulted in an unwanted pregnancy and abortion. The only available DNA from the fetus came in form of a mixture with the mother, and it was of interest to find the father of the fetus. The mother (the victim), however, refused to give her reference data and so commonly used methods for paternity testing were no longer applicable. As this case illustrates, kinship cases involving mixtures and missing reference profiles do occur and make the use of existing methods rather inconvenient. We here present statistical methods that may handle general relationship inference based on DNA mixtures. The basic idea is that likelihood calculations for mixtures can be decomposed into a series of kinship problems. This formulation of the problem facilitates the use of kinship software. We present the freely available R package relMix which extends on the R version of Familias. Complicating factors like mutations, silent alleles, and θ-correction are then easily handled for quite general family relationships, and are included in the statistical methods we develop in this paper. The methods and their implementations are exemplified on the data from the rape case.
Cortical circuits for perceptual inference.
Friston, Karl; Kiebel, Stefan
2009-10-01
This paper assumes that cortical circuits have evolved to enable inference about the causes of sensory input received by the brain. This provides a principled specification of what neural circuits have to achieve. Here, we attempt to address how the brain makes inferences by casting inference as an optimisation problem. We look at how the ensuing recognition dynamics could be supported by directed connections and message-passing among neuronal populations, given our knowledge of intrinsic and extrinsic neuronal connections. We assume that the brain models the world as a dynamic system, which imposes causal structure on the sensorium. Perception is equated with the optimisation or inversion of this internal model, to explain sensory input. Given a model of how sensory data are generated, we use a generic variational approach to model inversion to furnish equations that prescribe recognition; i.e., the dynamics of neuronal activity that represents the causes of sensory input. Here, we focus on a model whose hierarchical and dynamical structure enables simulated brains to recognise and predict sequences of sensory states. We first review these models and their inversion under a variational free-energy formulation. We then show that the brain has the necessary infrastructure to implement this inversion and present stimulations using synthetic birds that generate and recognise birdsongs.
Cortical circuits for perceptual inference.
Friston, Karl; Kiebel, Stefan
2009-10-01
This paper assumes that cortical circuits have evolved to enable inference about the causes of sensory input received by the brain. This provides a principled specification of what neural circuits have to achieve. Here, we attempt to address how the brain makes inferences by casting inference as an optimisation problem. We look at how the ensuing recognition dynamics could be supported by directed connections and message-passing among neuronal populations, given our knowledge of intrinsic and extrinsic neuronal connections. We assume that the brain models the world as a dynamic system, which imposes causal structure on the sensorium. Perception is equated with the optimisation or inversion of this internal model, to explain sensory input. Given a model of how sensory data are generated, we use a generic variational approach to model inversion to furnish equations that prescribe recognition; i.e., the dynamics of neuronal activity that represents the causes of sensory input. Here, we focus on a model whose hierarchical and dynamical structure enables simulated brains to recognise and predict sequences of sensory states. We first review these models and their inversion under a variational free-energy formulation. We then show that the brain has the necessary infrastructure to implement this inversion and present stimulations using synthetic birds that generate and recognise birdsongs. PMID:19635656
NASA Astrophysics Data System (ADS)
Dettmer, Jan; Molnar, Sheri; Steininger, Gavin; Dosso, Stan E.; Cassidy, John F.
2012-02-01
This paper applies a general trans-dimensional Bayesian inference methodology and hierarchical autoregressive data-error models to the inversion of microtremor array dispersion data for shear wave velocity (vs) structure. This approach accounts for the limited knowledge of the optimal earth model parametrization (e.g. the number of layers in the vs profile) and of the data-error statistics in the resulting vs parameter uncertainty estimates. The assumed earth model parametrization influences estimates of parameter values and uncertainties due to different parametrizations leading to different ranges of data predictions. The support of the data for a particular model is often non-unique and several parametrizations may be supported. A trans-dimensional formulation accounts for this non-uniqueness by including a model-indexing parameter as an unknown so that groups of models (identified by the indexing parameter) are considered in the results. The earth model is parametrized in terms of a partition model with interfaces given over a depth-range of interest. In this work, the number of interfaces (layers) in the partition model represents the trans-dimensional model indexing. In addition, serial data-error correlations are addressed by augmenting the geophysical forward model with a hierarchical autoregressive error model that can account for a wide range of error processes with a small number of parameters. Hence, the limited knowledge about the true statistical distribution of data errors is also accounted for in the earth model parameter estimates, resulting in more realistic uncertainties and parameter values. Hierarchical autoregressive error models do not rely on point estimates of the model vector to estimate data-error statistics, and have no requirement for computing the inverse or determinant of a data-error covariance matrix. This approach is particularly useful for trans-dimensional inverse problems, as point estimates may not be representative of the
Phylodynamic inference for structured epidemiological models.
Rasmussen, David A; Volz, Erik M; Koelle, Katia
2014-04-01
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates. PMID:24743590
Inferring Protein Associations Using Protein Pulldown Assays
Sharp, Julia L.; Anderson, Kevin K.; Daly, Don S.; Auberry, Deanna L.; Borkowski, John J.; Cannon, William R.
2007-02-01
Background: One method to infer protein-protein associations is through a “bait-prey pulldown” assay using a protein affinity agent and an LC-MS (liquid chromatography-mass spectrometry)-based protein identification method. False positive and negative protein identifications are not uncommon, however, leading to incorrect inferences. Methods: A pulldown experiment generates a protein association matrix wherein each column represents a sample from one bait protein, each row represents one prey protein and each cell contains a presence/absence association indicator. Our method evaluates the presence/absence pattern across a prey protein (row) with a Likelihood Ratio Test (LRT), computing its p-value with simulated LRT test statistic distributions after a check with simulated binomial random variates disqualified the large sample 2 test. A pulldown experiment often involves hundreds of tests so we apply the false discovery rate method to control the false positive rate. Based on the p-value, each prey protein is assigned a category (specific association, non-specific association, or not associated) and appraised with respect to the pulldown experiment’s goal and design. The method is illustrated using a pulldown experiment investigating the protein complexes of Shewanella oneidensis MR-1. Results: The Monte Carlo simulated LRT p-values objectively reveal specific and ubiquitous prey, as well as potential systematic errors. The example analysis shows the results to be biologically sensible and more realistic than the ad hoc screening methods previously utilized. Conclusions: The method presented appears to be informative for screening for protein-protein associations.
Category Representation for Classification and Feature Inference
ERIC Educational Resources Information Center
Johansen, Mark K.; Kruschke, John K.
2005-01-01
This research's purpose was to contrast the representations resulting from learning of the same categories by either classifying instances or inferring instance features. Prior inference learning research, particularly T. Yamauchi and A. B. Markman (1998), has suggested that feature inference learning fosters prototype representation, whereas…
Inferring connectivity in networked dynamical systems: Challenges using Granger causality
NASA Astrophysics Data System (ADS)
Lusch, Bethany; Maia, Pedro D.; Kutz, J. Nathan
2016-09-01
Determining the interactions and causal relationships between nodes in an unknown networked dynamical system from measurement data alone is a challenging, contemporary task across the physical, biological, and engineering sciences. Statistical methods, such as the increasingly popular Granger causality, are being broadly applied for data-driven discovery of connectivity in fields from economics to neuroscience. A common version of the algorithm is called pairwise-conditional Granger causality, which we systematically test on data generated from a nonlinear model with known causal network structure. Specifically, we simulate networked systems of Kuramoto oscillators and use the Multivariate Granger Causality Toolbox to discover the underlying coupling structure of the system. We compare the inferred results to the original connectivity for a wide range of parameters such as initial conditions, connection strengths, community structures, and natural frequencies. Our results show a significant systematic disparity between the original and inferred network, unless the true structure is extremely sparse or dense. Specifically, the inferred networks have significant discrepancies in the number of edges and the eigenvalues of the connectivity matrix, demonstrating that they typically generate dynamics which are inconsistent with the ground truth. We provide a detailed account of the dynamics for the Erdős-Rényi network model due to its importance in random graph theory and network science. We conclude that Granger causal methods for inferring network structure are highly suspect and should always be checked against a ground truth model. The results also advocate the need to perform such comparisons with any network inference method since the inferred connectivity results appear to have very little to do with the ground truth system.
Physics-based statistical learning approach to mesoscopic model selection.
Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab
2015-11-01
In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
Physics-based statistical learning approach to mesoscopic model selection
NASA Astrophysics Data System (ADS)
Taverniers, Søren; Haut, Terry S.; Barros, Kipton; Alexander, Francis J.; Lookman, Turab
2015-11-01
In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
Statistical methods for material characterization and qualification
Hunn, John D; Kercher, Andrew K
2005-01-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Statistical Methods for Material Characterization and Qualification
Kercher, A.K.
2005-04-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Statistical Methods in Cosmology
NASA Astrophysics Data System (ADS)
Verde, L.
2010-03-01
The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.
Universum Inference and Corpus Homogeneity
NASA Astrophysics Data System (ADS)
Vogel, Carl; Lynch, Gerard; Janssen, Jerom
Universum Inference is re-interpreted for assessment of corpus homogeneity in computational stylometry. Recent stylometric research quantifies strength of characterization within dramatic works by assessing the homogeneity of corpora associated with dramatic personas. A methodological advance is suggested to mitigate the potential for the assessment of homogeneity to be achieved by chance. Baseline comparison analysis is constructed for contributions to debates by nonfictional participants: the corpus analyzed consists of transcripts of US Presidential and Vice-Presidential debates from the 2000 election cycle. The corpus is also analyzed in translation to Italian, Spanish and Portuguese. Adding randomized categories makes assessments of homogeneity more conservative.
Teaching Classical Statistical Mechanics: A Simulation Approach.
ERIC Educational Resources Information Center
Sauer, G.
1981-01-01
Describes a one-dimensional model for an ideal gas to study development of disordered motion in Newtonian mechanics. A Monte Carlo procedure for simulation of the statistical ensemble of an ideal gas with fixed total energy is developed. Compares both approaches for a pseudoexperimental foundation of statistical mechanics. (Author/JN)
Network inference with confidence from multivariate time series.
Kramer, Mark A; Eden, Uri T; Cash, Sydney S; Kolaczyk, Eric D
2009-06-01
Networks--collections of interacting elements or nodes--abound in the natural and manmade worlds. For many networks, complex spatiotemporal dynamics stem from patterns of physical interactions unknown to us. To infer these interactions, it is common to include edges between those nodes whose time series exhibit sufficient functional connectivity, typically defined as a measure of coupling exceeding a predetermined threshold. However, when uncertainty exists in the original network measurements, uncertainty in the inferred network is likely, and hence a statistical propagation of error is needed. In this manuscript, we describe a principled and systematic procedure for the inference of functional connectivity networks from multivariate time series data. Our procedure yields as output both the inferred network and a quantification of uncertainty of the most fundamental interest: uncertainty in the number of edges. To illustrate this approach, we apply a measure of linear coupling to simulated data and electrocorticogram data recorded from a human subject during an epileptic seizure. We demonstrate that the procedure is accurate and robust in both the determination of edges and the reporting of uncertainty associated with that determination. PMID:19658533
Network inference with confidence from multivariate time series
NASA Astrophysics Data System (ADS)
Kramer, Mark A.; Eden, Uri T.; Cash, Sydney S.; Kolaczyk, Eric D.
2009-06-01
Networks—collections of interacting elements or nodes—abound in the natural and manmade worlds. For many networks, complex spatiotemporal dynamics stem from patterns of physical interactions unknown to us. To infer these interactions, it is common to include edges between those nodes whose time series exhibit sufficient functional connectivity, typically defined as a measure of coupling exceeding a predetermined threshold. However, when uncertainty exists in the original network measurements, uncertainty in the inferred network is likely, and hence a statistical propagation of error is needed. In this manuscript, we describe a principled and systematic procedure for the inference of functional connectivity networks from multivariate time series data. Our procedure yields as output both the inferred network and a quantification of uncertainty of the most fundamental interest: uncertainty in the number of edges. To illustrate this approach, we apply a measure of linear coupling to simulated data and electrocorticogram data recorded from a human subject during an epileptic seizure. We demonstrate that the procedure is accurate and robust in both the determination of edges and the reporting of uncertainty associated with that determination.
Statistical Estimation of Orbital Debris Populations with a Spectrum of Object Size
NASA Astrophysics Data System (ADS)
Xu, Yu-Lin; Horstman, Matthew; Krisko, Paula; Liou, J.-C.; Matney, Mark; Stansbery, Eugene; Stokely, Christopher; Whitlock, David
Orbital debris is a real concern for the safe operations of satellites. In general, the hazard of debris impact is a function of the size and spatial distributions of the debris populations. To describe and characterize the debris environment as reliably as possible, the current NASA Orbital Debris Engineering Model (ORDEM2000) is being upgraded to a new version based on new and better-quality data. The data-driven ORDEM model covers a wide range of object sizes from 10 microns to greater than 1 meter. This paper reviews the statistical process for the estimation of the debris populations in the new ORDEM upgrade, and discusses the representation of large-size (≥1 m and ≥10 cm) populations by SSN catalog objects and the validation of the statistical approach. Also, it presents results for the populations with sizes of ≥3.3 cm, ≥1 cm, ≥100 µm, and ≥10 µm. The orbital debris populations used in the new version of ORDEM are inferred from data based upon appropriate reference (or benchmark) populations instead of the binning of the multi-dimensional orbital-element space. This paper describes all of the major steps used in the population-inference procedure for each size-range. Detailed discussions on data analysis, parameter definition, the correlation between parameters and data, and uncertainty assessment are included.
Statistical Estimation of Orbital Debris Populations with a Spectrum of Object Size
NASA Technical Reports Server (NTRS)
Xu, Y. -l; Horstman, M.; Krisko, P. H.; Liou, J. -C; Matney, M.; Stansbery, E. G.; Stokely, C. L.; Whitlock, D.
2008-01-01
Orbital debris is a real concern for the safe operations of satellites. In general, the hazard of debris impact is a function of the size and spatial distributions of the debris populations. To describe and characterize the debris environment as reliably as possible, the current NASA Orbital Debris Engineering Model (ORDEM2000) is being upgraded to a new version based on new and better quality data. The data-driven ORDEM model covers a wide range of object sizes from 10 microns to greater than 1 meter. This paper reviews the statistical process for the estimation of the debris populations in the new ORDEM upgrade, and discusses the representation of large-size (greater than or equal to 1 m and greater than or equal to 10 cm) populations by SSN catalog objects and the validation of the statistical approach. Also, it presents results for the populations with sizes of greater than or equal to 3.3 cm, greater than or equal to 1 cm, greater than or equal to 100 micrometers, and greater than or equal to 10 micrometers. The orbital debris populations used in the new version of ORDEM are inferred from data based upon appropriate reference (or benchmark) populations instead of the binning of the multi-dimensional orbital-element space. This paper describes all of the major steps used in the population-inference procedure for each size-range. Detailed discussions on data analysis, parameter definition, the correlation between parameters and data, and uncertainty assessment are included.
A perceptual space of local image statistics.
Victor, Jonathan D; Thengone, Daniel J; Rizvi, Syed M; Conte, Mary M
2015-12-01
Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice - a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4min. In sum, local image statistics form a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules.
Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting
Solís-Lemus, Claudia; Ané, Cécile
2016-01-01
Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations. PMID:26950302
Statistical mechanics of prion diseases.
Slepoy, A; Singh, R R; Pázmándi, F; Kulkarni, R V; Cox, D L
2001-07-30
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model "species barriers" to prion infection and assess a related treatment protocol. PMID:11497806
Statistical Mechanics of Prion Diseases
NASA Astrophysics Data System (ADS)
Slepoy, A.; Singh, R. R.; Pázmándi, F.; Kulkarni, R. V.; Cox, D. L.
2001-07-01
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ``species barriers'' to prion infection and assess a related treatment protocol.
Statistical Mechanics of Prion Diseases
Slepoy, A.; Singh, R. R. P.; Pazmandi, F.; Kulkarni, R. V.; Cox, D. L.
2001-07-30
We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ''species barriers'' to prion infection and assess a related treatment protocol.
SYMBOLIC INFERENCE OF XENOBIOTIC METABOLISM
MCSHAN, D.C.; UPDADHAYAYA, M.; SHAH, I.
2009-01-01
We present a new symbolic computational approach to elucidate the biochemical networks of living systems de novo and we apply it to an important biomedical problem: xenobiotic metabolism. A crucial issue in analyzing and modeling a living organism is understanding its biochemical network beyond what is already known. Our objective is to use the available metabolic information in a representational framework that enables the inference of novel biochemical knowledge and whose results can be validated experimentally. We describe a symbolic computational approach consisting of two parts. First, biotransformation rules are inferred from the molecular graphs of compounds in enzyme-catalyzed reactions. Second, these rules are recursively applied to different compounds to generate novel metabolic networks, containing new biotransformations and new metabolites. Using data for 456 generic reactions and 825 generic compounds from KEGG we were able to extract 110 biotransformation rules, which generalize a subset of known biocatalytic functions. We tested our approach by applying these rules to ethanol, a common substance of abuse and to furfuryl alcohol, a xenobiotic organic solvent, which is absent in metabolic databases. In both cases our predictions on the fate of ethanol and furfuryl alcohol are consistent with the literature on the metabolism of these compounds. PMID:14992532
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Dopamine, affordance and active inference.
Friston, Karl J; Shiner, Tamara; FitzGerald, Thomas; Galea, Joseph M; Adams, Rick; Brown, Harriet; Dolan, Raymond J; Moran, Rosalyn; Stephan, Klaas Enno; Bestmann, Sven
2012-01-01
The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level.
Trandimensional Inference in the Geosciences
NASA Astrophysics Data System (ADS)
Bodin, Thomas
2016-04-01
An inverse problem is the task often occurring in many branches of Earth sciences, where the values of some model parameters describing the Earth must be obtained given noisy observations made at the surface. In all applications of inversion, assumptions are made about the nature of the model parametrisation and data noise characteristics, and results can significantly depend on those assumptions. These quantities are often manually `tuned' by means of subjective trial-and-error procedures, and this prevents to accurately quantify uncertainties in the solution. A Bayesian approach allows these assumptions to be relaxed by incorporating relevant parameters as unknowns in the inference problem. Rather than being forced to make decisions on parametrisation, the level of data noise and the weights between data types in advance, as is often the case in an optimization framework, the choice can be informed by the data themselves. Probabilistic sampling techniques such as transdimensional Markov chain Monte Carlo, allow sampling over complex posterior probability density functions, thus providing information on constraint, trade-offs and uncertainty in the unknowns. This presentation will present a review of transdimensional inference, and its application to different problems, ranging from Geochemistry to Solid Earth Geophysics.
Quantum Inference on Bayesian Networks
NASA Astrophysics Data System (ADS)
Yoder, Theodore; Low, Guang Hao; Chuang, Isaac
2014-03-01
Because quantum physics is naturally probabilistic, it seems reasonable to expect physical systems to describe probabilities and their evolution in a natural fashion. Here, we use quantum computation to speedup sampling from a graphical probability model, the Bayesian network. A specialization of this sampling problem is approximate Bayesian inference, where the distribution on query variables is sampled given the values e of evidence variables. Inference is a key part of modern machine learning and artificial intelligence tasks, but is known to be NP-hard. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time (nmP(e) - 1 / 2) , depending critically on P(e) , the probability the evidence might occur in the first place. However, by implementing a quantum version of rejection sampling, we obtain a square-root speedup, taking (n2m P(e) -1/2) time per sample. The speedup is the result of amplitude amplification, which is proving to be broadly applicable in sampling and machine learning tasks. In particular, we provide an explicit and efficient circuit construction that implements the algorithm without the need for oracle access.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Inference is bliss: using evolutionary relationship to guide categorical inferences.
Novick, Laura R; Catley, Kefyn M; Funk, Daniel J
2011-01-01
Three experiments, adopting an evolutionary biology perspective, investigated subjects' inferences about living things. Subjects were told that different enzymes help regulate cell function in two taxa and asked which enzyme a third taxon most likely uses. Experiment 1 and its follow-up, with college students, used triads involving amphibians, reptiles, and mammals (reptiles and mammals are most closely related evolutionarily) and plants, fungi, and animals (fungi are more closely related to animals than to plants). Experiment 2, with 10th graders, also included triads involving mammals, birds, and snakes/crocodilians (birds and snakes/crocodilians are most closely related). Some subjects received cladograms (hierarchical diagrams) depicting the evolutionary relationships among the taxa. The effect of providing cladograms depended on students' background in biology. The results illuminate students' misconceptions concerning common taxa and constraints on their willingness to override faulty knowledge when given appropriate evolutionary evidence. Implications for introducing tree thinking into biology curricula are discussed. PMID:21463358
Cosmetic Plastic Surgery Statistics
2014 Cosmetic Plastic Surgery Statistics Cosmetic Procedure Trends 2014 Plastic Surgery Statistics Report Please credit the AMERICAN SOCIETY OF PLASTIC SURGEONS when citing statistical data or using ...
Simulated tornado debris tracks: implications for inferring corner flow structure
NASA Astrophysics Data System (ADS)
Zimmerman, Michael; Lewellen, David
2011-11-01
A large collection of three-dimensional large eddy simulations of tornadoes with fine debris have been recently been performed as part of a longstanding effort at West Virginia University to understand tornado corner flow structure and dynamics. Debris removal and deposition is accounted for at the surface, in effect simulating formation of tornado surface marks. Physical origins and properties of the most prominent marks will be presented, and the possibility of inferring tornado corner flow structure from real marks in the field will be discussed. This material is based upon work supported by the National Science Foundation under Grants No. 0635681 and AGS-1013154.
NASA Astrophysics Data System (ADS)
Zubov, V. I.; Banyeretse, F.
The correlative unsymmetrized self-consistent field method is used to study surface properties of the two-dimensional model of an anharmonic crystal with square lattice having various Miller indices. The lattice relaxation, the amplitudes of atomic vibrations and the thermodynamic surface functions are calculated. The typical nonsingular and vicinal surfaces are considered. The dependence of thermodynamic surface functions on the surface orientation is obtained.
Virtual reality and consciousness inference in dreaming.
Hobson, J Allan; Hong, Charles C-H; Friston, Karl J
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research.
Inferring unstable equilibrium configurations from experimental data
NASA Astrophysics Data System (ADS)
Virgin, L. N.; Wiebe, R.; Spottswood, S. M.; Beberniss, T.
2016-09-01
This research considers the structural behavior of slender, mechanically buckled beams and panels of the type commonly found in aerospace structures. The specimens were deflected and then clamped in a rigid frame in order to exhibit snap-through. That is, the initial equilibrium and the buckled (snapped-through) equilibrium configurations both co-existed for the given clamped conditions. In order to transit between these two stable equilibrium configurations (for example, under the action of an externally applied load), it is necessary for the structural component to pass through an intermediate unstable equilibrium configuration. A sequence of sudden impacts was imparted to the system, of various strengths and at various locations. The goal of this impact force was to induce relatively intermediate-sized transients that effectively slowed-down in the vicinity of the unstable equilibrium configuration. Thus, monitoring the velocity of the motion, and specifically its slowing down, should give an indication of the presence of an equilibrium configuration, even though it is unstable and not amenable to direct experimental observation. A digital image correlation (DIC) system was used in conjunction with an instrumented impact hammer to track trajectories and statistical methods used to infer the presence of unstable equilibria in both a beam and a panel.
Virtual reality and consciousness inference in dreaming
Hobson, J. Allan; Hong, Charles C.-H.; Friston, Karl J.
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that – through experience-dependent plasticity – becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep – and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain’s generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis – evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research. PMID:25346710
Virtual reality and consciousness inference in dreaming.
Hobson, J Allan; Hong, Charles C-H; Friston, Karl J
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research. PMID:25346710
Functional network inference of the suprachiasmatic nucleus.
Abel, John H; Meeker, Kirsten; Granados-Fuentes, Daniel; St John, Peter C; Wang, Thomas J; Bales, Benjamin B; Doyle, Francis J; Herzog, Erik D; Petzold, Linda R
2016-04-19
In the mammalian suprachiasmatic nucleus (SCN), noisy cellular oscillators communicate within a neuronal network to generate precise system-wide circadian rhythms. Although the intracellular genetic oscillator and intercellular biochemical coupling mechanisms have been examined previously, the network topology driving synchronization of the SCN has not been elucidated. This network has been particularly challenging to probe, due to its oscillatory components and slow coupling timescale. In this work, we investigated the SCN network at a single-cell resolution through a chemically induced desynchronization. We then inferred functional connections in the SCN by applying the maximal information coefficient statistic to bioluminescence reporter data from individual neurons while they resynchronized their circadian cycling. Our results demonstrate that the functional network of circadian cells associated with resynchronization has small-world characteristics, with a node degree distribution that is exponential. We show that hubs of this small-world network are preferentially located in the central SCN, with sparsely connected shells surrounding these cores. Finally, we used two computational models of circadian neurons to validate our predictions of network structure.
Functional network inference of the suprachiasmatic nucleus.
Abel, John H; Meeker, Kirsten; Granados-Fuentes, Daniel; St John, Peter C; Wang, Thomas J; Bales, Benjamin B; Doyle, Francis J; Herzog, Erik D; Petzold, Linda R
2016-04-19
In the mammalian suprachiasmatic nucleus (SCN), noisy cellular oscillators communicate within a neuronal network to generate precise system-wide circadian rhythms. Although the intracellular genetic oscillator and intercellular biochemical coupling mechanisms have been examined previously, the network topology driving synchronization of the SCN has not been elucidated. This network has been particularly challenging to probe, due to its oscillatory components and slow coupling timescale. In this work, we investigated the SCN network at a single-cell resolution through a chemically induced desynchronization. We then inferred functional connections in the SCN by applying the maximal information coefficient statistic to bioluminescence reporter data from individual neurons while they resynchronized their circadian cycling. Our results demonstrate that the functional network of circadian cells associated with resynchronization has small-world characteristics, with a node degree distribution that is exponential. We show that hubs of this small-world network are preferentially located in the central SCN, with sparsely connected shells surrounding these cores. Finally, we used two computational models of circadian neurons to validate our predictions of network structure. PMID:27044085
Mistaking geography for biology: inferring processes from species distributions.
Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I
2014-10-01
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity).
Automatic inference of sulcus patterns using 3D moment invariants.
Sun, Z Y; Rivière, D; Poupon, F; Régis, J; Mangin, J F
2007-01-01
The goal of this work is the automatic inference of frequent patterns of the cortical sulci, namely patterns that can be observed only for a subset of the population. The sulci are detected and identified using brainVISA open software. Then, each sulcus is represented by a set of shape descriptors called the 3D moment invariants. Unsupervised agglomerative clustering is performed to define the patterns. A ratio between compactness and contrast among clusters is used to select the best patterns. A pattern is considered significant when this ratio is statistically better than the ratios obtained for clouds of points following a Gaussian distribution. The patterns inferred for the left cingulate sulcus are consistent with the patterns described in the atlas of Ono.
Owl's behavior and neural representation predicted by Bayesian inference.
Fischer, Brian J; Peña, José Luis
2011-08-01
The owl captures prey using sound localization. In the classical model, the owl infers sound direction from the position of greatest activity in a brain map of auditory space. However, this model fails to describe the actual behavior. Although owls accurately localize sources near the center of gaze, they systematically underestimate peripheral source directions. We found that this behavior is predicted by statistical inference, formulated as a Bayesian model that emphasizes central directions. We propose that there is a bias in the neural coding of auditory space, which, at the expense of inducing errors in the periphery, achieves high behavioral accuracy at the ethologically relevant range. We found that the owl's map of auditory space decoded by a population vector is consistent with the behavioral model. Thus, a probabilistic model describes both how the map of auditory space supports behavior and why this representation is optimal. PMID:21725311
Structural inference for uncertain networks
NASA Astrophysics Data System (ADS)
Martin, Travis; Ball, Brian; Newman, M. E. J.
2016-01-01
In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network.
Transdimensional inference in the geosciences.
Sambridge, M; Bodin, T; Gallagher, K; Tkalcic, H
2013-02-13
Seismologists construct images of the Earth's interior structure using observations, derived from seismograms, collected at the surface. A common approach to such inverse problems is to build a single 'best' Earth model, in some sense. This is despite the fact that the observations by themselves often do not require, or even allow, a single best-fit Earth model to exist. Interpretation of optimal models can be fraught with difficulties, particularly when formal uncertainty estimates become heavily dependent on the regularization imposed. Similar issues occur across the physical sciences with model construction in ill-posed problems. An alternative approach is to embrace the non-uniqueness directly and employ an inference process based on parameter space sampling. Instead of seeking a best model within an optimization framework, one seeks an ensemble of solutions and derives properties of that ensemble for inspection. While this idea has itself been employed for more than 30 years, it is now receiving increasing attention in the geosciences. Recently, it has been shown that transdimensional and hierarchical sampling methods have some considerable benefits for problems involving multiple parameter types, uncertain data errors and/or uncertain model parametrizations, as are common in seismology. Rather than being forced to make decisions on parametrization, the level of data noise and the weights between data types in advance, as is often the case in an optimization framework, the choice can be informed by the data themselves. Despite the relatively high computational burden involved, the number of areas where sampling methods are now feasible is growing rapidly. The intention of this article is to introduce concepts of transdimensional inference to a general readership and illustrate with particular seismological examples. A growing body of references provide necessary detail. PMID:23277604
Bayesian Nonparametric Inference – Why and How
Müller, Peter; Mitra, Riten
2013-01-01
We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
Generic Comparison of Protein Inference Engines*
Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O.; Buhmann, Joachim M.; Aebersold, Ruedi
2012-01-01
Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context. PMID:22057310
NASA Astrophysics Data System (ADS)
Zubov, V. I.; Banyeretse, F.; Tretyakov, N. P.; Mamontov, I. V.
The correlative, or improved, unsymmetrized self-consistent field method (CUSF) is developed to study surface properties of anharmonic crystals. The equations for moments of the one-particle functions of atoms are obtained. Their solution determines the lattice relaxation near the surface, the amplitudes of anharmonic vibrations of atoms and the self-consistent potentials. A calculation method of the Helmholtz free energy of anharmonic crystal — vapour interface is developed. As an application, the properties of the singular surfaces of two-dimensional models with square and hexagonal lattices are calculated.
Causal Inference and Explaining Away in a Spiking Network
Moreno-Bote, Rubén; Drugowitsch, Jan
2015-01-01
While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426
Room geometry inference based on spherical microphone array eigenbeam processing.
Mabande, Edwin; Kowalczyk, Konrad; Sun, Haohai; Kellermann, Walter
2013-10-01
The knowledge of parameters characterizing an acoustic environment, such as the geometric information about a room, can be used to enhance the performance of several audio applications. In this paper, a novel method for three-dimensional room geometry inference based on robust and high-resolution beamforming techniques for spherical microphone arrays is presented. Unlike other approaches that are based on the measurement and processing of multiple room impulse responses, here, microphone array signal processing techniques for uncontrolled broadband acoustic signals are applied. First, the directions of arrival (DOAs) and time differences of arrival (TDOAs) of the direct signal and room reflections are estimated using high-resolution robust broadband beamforming techniques and cross-correlation analysis. In this context, the main challenges include the low reflected-signal to background-noise power ratio, the low energy of reflected signals relative to the direct signal, and their strong correlation with the direct signal and among each other. Second, the DOA and TDOA information is combined to infer the room geometry using geometric relations. The high accuracy of the proposed room geometry inference technique is confirmed by experimental evaluations based on both simulated and measured data for moderately reverberant rooms. PMID:24116416
Halo detection via large-scale Bayesian inference
NASA Astrophysics Data System (ADS)
Merson, Alexander I.; Jasche, Jens; Abdalla, Filipe B.; Lahav, Ofer; Wandelt, Benjamin; Jones, D. Heath; Colless, Matthew
2016-08-01
We present a proof-of-concept of a novel and fully Bayesian methodology designed to detect haloes of different masses in cosmological observations subject to noise and systematic uncertainties. Our methodology combines the previously published Bayesian large-scale structure inference algorithm, HAmiltonian Density Estimation and Sampling algorithm (HADES), and a Bayesian chain rule (the Blackwell-Rao estimator), which we use to connect the inferred density field to the properties of dark matter haloes. To demonstrate the capability of our approach, we construct a realistic galaxy mock catalogue emulating the wide-area 6-degree Field Galaxy Survey, which has a median redshift of approximately 0.05. Application of HADES to the catalogue provides us with accurately inferred three-dimensional density fields and corresponding quantification of uncertainties inherent to any cosmological observation. We then use a cosmological simulation to relate the amplitude of the density field to the probability of detecting a halo with mass above a specified threshold. With this information, we can sum over the HADES density field realisations to construct maps of detection probabilities and demonstrate the validity of this approach within our mock scenario. We find that the probability of successful detection of haloes in the mock catalogue increases as a function of the signal to noise of the local galaxy observations. Our proposed methodology can easily be extended to account for more complex scientific questions and is a promising novel tool to analyse the cosmic large-scale structure in observations.
Generic inference of inflation models by local non-Gaussianity
NASA Astrophysics Data System (ADS)
Dorn, Sebastian; Ramirez, Erandy; Kunze, Kerstin E.; Hofmann, Stefan; Enßlin, Torsten A.
2014-05-01
The presence of multiple fields during inflation might seed a detectable amount of non-Gaussianity in the curvature perturbations, which in turn becomes observable in present data sets like the cosmic microwave background (CMB) or the large scale structure (LSS). Within this proceeding we present a fully analytic method to infer inflationary parameters from observations by exploiting higher-order statistics of the curvature perturbations. To keep this analyticity, and thereby to dispense with numerically expensive sampling techniques, a saddle-point approximation is introduced whose precision has been validated for a numerical toy example. Applied to real data, this approach might enable to discriminate among the still viable models of inflation.
The Impact of Disablers on Predictive Inference
ERIC Educational Resources Information Center
Cummins, Denise Dellarosa
2014-01-01
People consider alternative causes when deciding whether a cause is responsible for an effect (diagnostic inference) but appear to neglect them when deciding whether an effect will occur (predictive inference). Five experiments were conducted to test a 2-part explanation of this phenomenon: namely, (a) that people interpret standard predictive…
Causal inference in economics and marketing.
Varian, Hal R
2016-07-01
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.
Genetic Network Inference Using Hierarchical Structure
Kimura, Shuhei; Tokuhisa, Masato; Okada-Hatakeyama, Mariko
2016-01-01
Many methods for inferring genetic networks have been proposed, but the regulations they infer often include false-positives. Several researchers have attempted to reduce these erroneous regulations by proposing the use of a priori knowledge about the properties of genetic networks such as their sparseness, scale-free structure, and so on. This study focuses on another piece of a priori knowledge, namely, that biochemical networks exhibit hierarchical structures. Based on this idea, we propose an inference approach that uses the hierarchical structure in a target genetic network. To obtain a reasonable hierarchical structure, the first step of the proposed approach is to infer multiple genetic networks from the observed gene expression data. We take this step using an existing method that combines a genetic network inference method with a bootstrap method. The next step is to extract a hierarchical structure from the inferred networks that is consistent with most of the networks. Third, we use the hierarchical structure obtained to assign confidence values to all candidate regulations. Numerical experiments are also performed to demonstrate the effectiveness of using the hierarchical structure in the genetic network inference. The improvement accomplished by the use of the hierarchical structure is small. However, the hierarchical structure could be used to improve the performances of many existing inference methods. PMID:26941653
Reinforcement learning or active inference?
Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J
2009-01-01
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
Active inference and epistemic value.
Friston, Karl; Rigoli, Francesco; Ognibene, Dimitri; Mathys, Christoph; Fitzgerald, Thomas; Pezzulo, Giovanni
2015-01-01
We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms. PMID:25689102
Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg
2015-03-01
Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
Inference-based constraint satisfaction supports explanation
Sqalli, M.H.; Freuder, E.C.
1996-12-31
Constraint satisfaction problems are typically solved using search, augmented by general purpose consistency inference methods. This paper proposes a paradigm shift in which inference is used as the primary problem solving method, and attention is focused on special purpose, domain specific inference methods. While we expect this approach to have computational advantages, we emphasize here the advantages of a solution method that is more congenial to human thought processes. Specifically we use inference-based constraint satisfaction to support explanations of the problem solving behavior that are considerably more meaningful than a trace of a search process would be. Logic puzzles are used as a case study. Inference-based constraint satisfaction proves surprisingly powerful and easily extensible in this domain. Problems drawn from commercial logic puzzle booklets are used for evaluation. Explanations are produced that compare well with the explanations provided by these booklets.
Review of robust multivariate statistical methods in high dimension.
Filzmoser, Peter; Todorov, Valentin
2011-10-31
General ideas of robust statistics, and specifically robust statistical methods for calibration and dimension reduction are discussed. The emphasis is on analyzing high-dimensional data. The discussed methods are applied using the packages chemometrics and rrcov of the statistical software environment R. It is demonstrated how the functions can be applied to real high-dimensional data from chemometrics, and how the results can be interpreted.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123
Pecevski, Dejan
2016-01-01
Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214
Demographic inference under the coalescent in a spatial continuum.
Guindon, Stéphane; Guo, Hongbin; Welch, David
2016-10-01
Understanding population dynamics from the analysis of molecular and spatial data requires sound statistical modeling. Current approaches assume that populations are naturally partitioned into discrete demes, thereby failing to be relevant in cases where individuals are scattered on a spatial continuum. Other models predict the formation of increasingly tight clusters of individuals in space, which, again, conflicts with biological evidence. Building on recent theoretical work, we introduce a new genealogy-based inference framework that alleviates these issues. This approach effectively implements a stochastic model in which the distribution of individuals is homogeneous and stationary, thereby providing a relevant null model for the fluctuation of genetic diversity in time and space. Importantly, the spatial density of individuals in a population and their range of dispersal during the course of evolution are two parameters that can be inferred separately with this method. The validity of the new inference framework is confirmed with extensive simulations and the analysis of influenza sequences collected over five seasons in the USA. PMID:27184386
Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent
Wen, Dingqiao; Yu, Yun; Nakhleh, Luay
2016-01-01
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. PMID:27144273
Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent.
Wen, Dingqiao; Yu, Yun; Nakhleh, Luay
2016-05-01
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. PMID:27144273
Computational approaches to protein inference in shotgun proteomics.
Li, Yong Fuga; Radivojac, Predrag
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Baumrind, S; Korn, E L
1992-12-01
This paper presents case-specific quantitative evidence of the systematic lateral displacement of metallic implants in the mandibles of treated and untreated human subjects between the ages of 8.5 and 15.5 years. This evidence appears to be consistent with the inference of small, but systematic increases in distance between the internal structures of the two sides of the osseous mandible during growth. Such a conclusion, however, is inconsistent with traditional beliefs that the internal structures of the mandibular symphysis fuse at the midline during the first post-natal year and remain dimensionally constant thereafter. We recently published evidence of statistically significant transverse displacement of metallic implants in the mandibular body region for 12 of 28 subjects for whom longitudinal data were available. Of the twelve subjects for whom statistically significant changes were observed, widening occurred in eleven cases and narrowing in one. Matching data are now available on concurrent ramus changes for 22 of the same 28 subjects, including 11 of the 12 for whom statistically significant width changes had previously been noted in the body region. In eight of these 11 subjects, statistically significant widening in the ramus region was also observed. No subject had statistically significant widening in the ramus region without also having statistically significant widening in the body region. No subject had statistically significant trans-ramus narrowing.
Analyzing High-Dimensional Multispectral Data
NASA Technical Reports Server (NTRS)
Lee, Chulhee; Landgrebe, David A.
1993-01-01
In this paper, through a series of specific examples, we illustrate some characteristics encountered in analyzing high- dimensional multispectral data. The increased importance of the second-order statistics in analyzing high-dimensional data is illustrated, as is the shortcoming of classifiers such as the minimum distance classifier which rely on first-order variations alone. We also illustrate how inaccurate estimation or first- and second-order statistics, e.g., from use of training sets which are too small, affects the performance of a classifier. Recognizing the importance of second-order statistics on the one hand, but the increased difficulty in perceiving and comprehending information present in statistics derived from high-dimensional data on the other, we propose a method to aid visualization of high-dimensional statistics using a color coding scheme.
Model averaging and muddled multimodel inferences.
Cade, Brian S
2015-09-01
Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the t
Elastic-Net Copula Granger Causality for Inference of Biological Networks
Siyal, Mohammad Yakoob
2016-01-01
Aim In bioinformatics, the inference of biological networks is one of the most active research areas. It involves decoding various complex biological networks that are responsible for performing diverse functions in human body. Among these networks analysis, most of the research focus is towards understanding effective brain connectivity and gene networks in order to cure and prevent related diseases like Alzheimer and cancer respectively. However, with recent advances in data procurement technology, such as DNA microarray analysis and fMRI that can simultaneously process a large amount of data, it yields high-dimensional data sets. These high dimensional dataset analyses possess challenges for the analyst. Background Traditional methods of Granger causality inference use ordinary least-squares methods for structure estimation, which confront dimensionality issues when applied to high-dimensional data. Apart from dimensionality issues, most existing methods were designed to capture only the linear inferences from time series data. Method and Conclusion In this paper, we address the issues involved in assessing Granger causality for both linear and nonlinear high-dimensional data by proposing an elegant form of the existing LASSO-based method that we call “Elastic-Net Copula Granger causality”. This method provides a more stable way to infer biological networks which has been verified using rigorous experimentation. We have compared the proposed method with the existing method and demonstrated that this new strategy outperforms the existing method on all measures: precision, false detection rate, recall, and F1 score. We have also applied both methods to real HeLa cell data and StarPlus fMRI datasets and presented a comparison of the effectiveness of both methods. PMID:27792750
Quantifying Uncertainty in Inferred Viscosity and Basal Shear Stress Over Ice Streams
NASA Astrophysics Data System (ADS)
Lilien, D.; Joughin, I.; Smith, B. E.
2015-12-01
Basal friction and ice viscosity are both essential controls on glacier motion that cannot be measured by remote sensing. In order to initialize models, it is common practice to use inverse methods to determine the basal shear stress over grounded ice and the viscosity of floating ice. It is difficult to quantify the uncertainty in the inferred parameters due to the computational expense of the procedure, the choice of regularization parameter, and the errors in the various measurements used as input, as well as differences in inversion method. Various methods can be used to perform the inversion, and these differing procedures cause discrepancies in the inferred properties of the ice streams. Additionally, the inferred parameters depend on the sophistication of the approximation for ice flow that is used, e.g. full-Stokes or the shallow-shelf approximation. We analyze the impact the choices of modeling procedure and inversion method have on inferred ice properties. To do this we perform a number of inversions for basal shear stress and for ice shelf viscosity over Smith, Pope, and Kohler Glaciers in West Antarctica and assess the sensitivity to modelers' choices. We use both a three dimensional full-Stokes model and a two dimensional shallow-shelf model, with both Robin and adjoint type inversion procedures, to infer basal shear stress and ice viscosity. We compare the results of these different methods and evaluate their implication on uncertainty in the unknown parameters.